Schedule

For the Spring 2023 quarter we will meet twice a week on Wednesday and Friday at 12 - 3 pm in YORK 3050 (map). Clicking on the topics below will take you to supporting class content including video lectures, hands-on “lab session” sheets, walk-through screencasts, required reading material and homework assignments.

#	Date	Topics for Spring 2023
0	-	Getting Oriented Course introduction, Learning goals & expectations, Meet the instructional team. Setup your computer with required software.
1	Wed 04/05/23	Welcome to bioinformatics Biology is an information science. History of Bioinformatics. Types of data. Application areas and introduction to upcoming course segments. Hands on with major Bioinformatics databases and key online NCBI and EBI resources.
2	Fri 04/07/23	Sequence alignment fundamentals, algorithms and applications Homology. Sequence similarity. Local and global alignment: classic Needleman-Wunsch, Smith-Waterman, and BLAST heuristic approaches. Hands on with dot plots, Needleman-Wunsch, and BLAST algorithms highlighting their utility and limitations.
3*	Wed 04/12/23	Project: Find a gene project assignment (Part 1) Principles of database searching, due in 2 weeks. (Part 2) Sequence analysis, structure analysis and general data analysis with R due at the end of the quarter.
3	Wed 04/12/23	Advanced sequence alignment and database searching Detecting remote sequence similarity. Database searching beyond BLAST. Substitution matrices. Using PSI-BLAST, Profiles, and HMMs. Protein structure comparisons as a gold standard.
4	Fri 04/14/23	Bioinformatics data analysis with R Why do we use R for bioinformatics? R language basics and the RStudio IDE. Major R data structures and functions. Using R interactively from the RStudio console. Introducing Rmarkdown documents.
5	Wed 04/19/23	Data exploration and visualization in R The exploratory data analysis mindset. Data visualization best practices. Simple base graphics (including scatterplots, histograms, bar graphs, dot chats, boxplots, and heatmaps). Building more complex charts with ggplot.
6	Fri 04/21/23	Why, when and how of writing your own R functions The basics of writing your own functions that promote code robustness, reduce duplication, and facilitate code re-use. Extending functionality and utility with R packages from CRAN and BioConductor. Working with Bio3D for molecular data.
7	Wed 04/26/23	Introduction to machine learning for bioinformatics Unsupervised learning. K-means clustering. Hierarchical clustering. Heatmap representations. Dimensionality reduction. Principal Component Analysis (PCA).
8	Fri 04/28/23	Unsupervised learning mini-project Longer hands-on session with unsupervised learning analysis of cancer cells. Practical considerations and best practices for the analysis and visualization of high dimensional datasets.
9	Wed 05/03/23	Structural bioinformatics Protein structure function relationships. Protein structure and visualization resources. Modeling energy as a function of structure. Homology modeling. AlphaFold. Predicting functional dynamics. Inferring protein function from structure.
10	Fri 05/05/23	Halloween candy mini-project A fun and topical mini-project with unsupervised learning analysis of halloween_candy. Practical considerations and best practices for the exploratory analysis and visualization of high dimensional datasets.
11	Wed 05/10/23	Genome informatics and high throughput sequencing Searching genes and gene functions. Genome databases. Variation in the genome. High-throughput sequencing technologies, biological applications, and bioinformatics analysis methods. The Galaxy platform along with resources from the EBI & UCSC.
12	Fri 05/12/23	Transcriptomics and the analysis of RNA-Seq data RNA-Seq aligners. Differential expression tests. RNA-Seq statistics. Counts, FPKMs, and avoiding P-value misuse. Hands-on analysis of RNA-Seq data with R. Gene functional annotation. Functional databases KEGG, InterPro, GO ontologies, and functional enrichment.
13	Wed 05/17/23	RNA-Seq mini-project Differential expression analysis project with DESeq2 followed by gene enrichment and functional annotation with KEGG, InterPro, and GO ontologies.
14	Fri 05/19/23	Hands-on with Git and GitHub Why you should use a version control system. How to perform common operations with Git. Creating and working with your own GitHub repos and navigating and using those of others.
15	Wed 05/24/23	Essential UNIX for bioinformatics Bioinformatics on the command line. Understanding processes. File system structure. Connecting to remote servers. Redirection, streams, and pipes. Workflows for batch processing. Launching and using AWS EC2 instances (or virtual machines).
16	Fri 05/26/23	Analyzing sequencing data in the cloud mini-project A mini-project using AWS EC2 to query, download, decompress and analyze large data sets from the Sequence Read Archive. Practical considerations and best practices for installing bioinformatics software on Linux, transfering large data sets, and performing analysis either locally or on AWS.
17	Wed 05/31/23	Vaccination rate mini-project A topical mini-project using ggplot and dplyr with the latest state wide COVID-19 vaccination data. Practical considerations and best practices for the exploratory analysis.
18	Fri 06/02/23	Mutational signatures of cancer mini-project Cancer genomics resources and bioinformatics tools for investigating the molecular basis of cancer. Large scale cancer sequencing projects. The Cancer Genome Atlas project, the cBioPortal platform, and the COSMIC database. De novo extraction and assignment of mutational signatures. Find-a-gene project due!
19	Wed 06/07/23	Pertussis resurgence mini-project A topical mini-project using web-scraping, JSON based APIs, and advanced dplyr and ggplot to investigate brand new datasets associated with pertussis cases and longitudinal RNA-Seq on the immune response to vaccination.
20	Fri 06/09/23	Portfolio building and discussion of bioinformatics in industry Course summary and review. Making a public facing GitHub pages portfolio of your bioinformatics work. Interview with leading bioinformatics and genomics scientists from industry.

Class material

0: Getting oriented

Topics:
Course introduction. Learning goals & expectations. Meet the instructional team. Seting up your computer with required software.

Goals:

Understand course scope, expectations, logistics and ethics code.
Complete the pre-course questionnaire.
Setup your computer for this course.

Videos:

0.1 - Welcome to BIMM143 (course introduction and overview)
0.2 - Website overview (finding course content and installing software)
Note: format and links on the current website may differ from those in video.

Supporting material:

Handout: Class Syllabus
Pre-course Questionnaire
Computer Setup Instructions
Sign up for our Piazza class Q&A site
View the class GradeBook

1: Welcome to bioinformatics

Topics:
Biology is an information science. History of Bioinformatics. Types of data. Application areas and introduction to upcoming course segments. Introduction to NCBI & EBI resources for the molecular domain of bioinformatics. Hands-on session using NCBI-BLAST, Entrez, GENE, UniProt, Muscle, and PDB bioinformatics tools and databases.

Goals:

Understand the increasing necessity for computation in modern life sciences research.
Get introduced to how bioinformatics is practiced.
Be able to query, search, compare and contrast the data contained in major bioinformatics databases (GenBank, GENE, UniProt, PFAM, OMIM, PDB) and describe how these databases intersect.
The goals of the hands-on session is to introduce a range of core bioinformatics databases and associated online services while actively investigating the molecular basis of several common human diseases.

Videos:

Hands-on Lab:

Lab: Hands-on section worksheet
Feedback: Muddy Point Assessment

Supporting Material:

Lecture Slides: Large PDF / Small PDF
Handout: Major Bioinformatics Databases
Screencast Lab: Video walk-through
Office/Student Hours (Marcos): In-person on Fri at 3pm & Zoom on Mon at 12pm
Office/Student Hours (Josef): Zoom on Mon at 4pm

Homework:

Quiz
Submit your completed lab report (i.e. filled in PDF form) to GradeScope
Readings:
- PDF1: What is bioinformatics? An introduction and overview
- PDF2: Advancements and Challenges in Computational Biology

2: Sequence alignment fundamentals, algorithms and applications

Topics:
Sequence Alignment and Database Searching: Homology, Sequence similarity, Local and global alignment, Heuristic approaches, Database searching with BLAST, E-values, and evaluating alignment scores and statistics.

Goals:

Be able to describe how dynamic programming works for pairwise sequence alignment.
Appreciate the differences between global and local alignment along with their major application areas.
Understand how aligning novel sequences with previously characterized genes or proteins provides important insights into their common attributes and evolutionary origins.
The goals of the hands-on session are to explore the principles underlying the computational tools that can be used to compute and evaluate sequence alignments.

Videos:

2.1 - Alignment fundamentals
2.2 - Dot plots
2.3 - Dynamic programing, global alignment
2.4 - Dynamic programing, local alignment and BLAST basics

Hands-on Lab:

Lab: Hands-on section worksheet
Feedback: Muddy Point Assessment

Supporting Material:

Lecture Slides: Large PDF / Small PDF
Dot Plot App Mirrors: app-1, app-2
Screencast Lab: Video walk-through
Office/Student Hours (Marcos): In-person on Fri at 3pm & Zoom on Mon at 12pm
Office/Student Hours (Josef): Zoom on Mon at 4pm

Homework:

Quiz
Submit your completed lab report (i.e. filled in PDF form) answering questions Q1 to Q10 to GradeScope
Complete the following Alignment Problem and submit it to GradeScope
For next week please install R and RStudio
DataCamp: Sign-up to our BIMM143_DiazGay group/organization via the link on Piazza or in your UCSD email. We will use this from next week onward. However, feel free to get started with your first course Introduction to R!

Readings:

Readings: PDF1: What is dynamic programming?
Readings: PDF2 Fundamentals of database searching

3*: (Project) Find a gene assignment

The find-a-gene project is a required assignment for BIMM 143. The objective with this assignment is for you to demonstrate your grasp of database searching, sequence analysis, structure analysis and the R environment that we have covered to date in class.

You may wish to consult the scoring rubric at the end of the above linked project description and the example report for format and content guidance.

Your responses to questions Q1-Q4 are due in three weeks time (Tuesday, May 2 at 12pm).
The complete assignment, including responses to all questions, is due Tuesday of week 10 (June 6) at 12pm.
In both instances your PDF format report should be submitted to GradeScope. Late responses will not be accepted under any circumstances.

Videos:

3.1 - Project introduction
Note: due dates may differ from those in video.

3: Advanced sequence alignment and database searching

Topics:
Detecting remote sequence similarity. Substitution matrices. Database searching beyond BLAST with PSI-BLAST. Profiles and HMMs. Protein structure comparisons. Beginning with command line based database searches.

Goals:

Be able to calculate the alignment score between protein (or nucleotide) sequences using a provided scoring matrix such as BLOSUM62.
Understand the limits of homology detection with tools such as BLAST.
Know how to derive a PROSITE style regular expression for aligned motifs.
Be able to calculate a PSSM profile and for aligned sequences and subsequently score new sequences using a PSSM.
Be able to perform PSI-BLAST, HMMER and protein structure based database searches and interpret the results in terms of the biological significance of an e-value.
Be familiar with the concepts of True Positives, False Positives, Sensitivity and Specificity.

Hands-on Lab:

Lab: Hands-on section worksheet
Feedback: Muddy Point Assessment

Supporting Material:

Lecture Slides: Large PDF / Small PDF
Bonus: Alignment App

Homework:

Questions click and select “make a copy” then follow instructions
Submit your completed lab report (i.e. filled in PDF form) answering questions Q1 to Q7 to GradeScope
DataCamp: Sign-up to our BIMM143_DiazGay group/organization via the link in your UCSD email and start (you do not have to finish yet) Introduction to R! (we will complete this next week).
RStudio and R download and setup

4: Bioinformatics data analysis with R

Topics:
Why do we use R for bioinformatics? R language basics and the RStudio IDE. Major R data structures and functions. Using R interactively from the RStudio console.

Goals:

Understand why we use R for bioinformatics.
Familiarity with R’s basic syntax.
Familiarity with major R data structures (vectors, data.frames and lists).
Understand the basics of using functions (arguments, vectorizion and re-cycling).

Videos:

4.1 Why R and RStudio
4.2 Major R data structures, data types, and using functions
4.3 Working with DataCamp
Note: Use your UCSD email invite to sign up and visit our class group/organization

Hands-on Lab:

Lab: Hands-on section
Feedback: Muddy point assessment

Supporting Material:

Lecture Slides: Large PDF / Small PDF
Cheat Sheet: Base R overview
Screencast Lab: Video walk-through focusing on introducing R data structures and core syntax
Optional extension: Advanced conservation analysis of globins with R this demonstrates where we are going on our R learning journey. You should be able to do analysis like this on your own at the end of the course.
Office/Student Hours (Marcos): In-person on Fri at 3pm & Zoom on Mon at 12pm
Office/Student Hours (Josef): Zoom on Mon at 4pm

Homework:

Quiz
Submit your lab report to GradeScope
DataCamp: Sign-up to our BIMM143_DiazGay group/organization via the link in your UCSD email and complete Introduction to R! (~4hrs)

[OPTIONAL] Extra Credit:

Extra credit lab: Introduction to data in R

5: Data exploration and visualization in R

Topics:
The exploratory data analysis mindset. Data visualization best practices. Simple base graphics (including scatterplots, histograms, bar graphs, dot chats, boxplots, and heatmaps). Building more complex charts with ggplot.

Goals:

Appreciate the major elements of exploratory data analysis and why it is important to visualize data.
Be conversant with data visualization best practices and understand how good visualizations optimize for the human visual system.
Be able to generate informative graphical displays including scatterplots, histograms, bar graphs, boxplots, dendrograms and heatmaps and thereby gain exposure to the extensive graphical capabilities of R.
Appreciate that you can build even more complex charts with ggplot and additional R packages.
Be able to write and (re)use basic R scripts to aid with reproducibility.

Videos:

Hands-on Lab:

Lab: Hands-on worksheet
Feedback: Muddy point assessment

Supporting Material:

Lecture Slides: Large PDF / Small PDF
Screencast Lab: Video walk-through
Side-Note: Convincing with graphics
Check-out the Data-to-Viz website and the ggplot cheat sheat
Office/Student Hours (Marcos): In-person on Fri at 3pm & Zoom on Mon at 12pm
Office/Student Hours (Josef): Zoom on Mon at 4pm

Homework:

Quiz
Submit your completed PDF lab report to GradeScope
DataCamp: Introduction to data visualization with ggplot2 (~4hrs)

6: Why, when and how of writing your own R functions

Topics:
The why, when and how of writing your own R functions with worked examples. Further extending functionality and utility with R packages. Obtaining R packages from CRAN and Bioconductor. Working with Bio3D for molecular data. Managing genome-scale data with Bioconductor.

Goals:

Understand the structure and syntax of R functions and how to view the code of any R function.
Be able to follow a step by step process of going from a working code snippet to a more robust function that reduces duplication and facilitate code re-use.
Be able to find and install R packages from CRAN and bioconductor.
Understand how to find and use package vignettes, demos, documentation, tutorials and source code repository where available.

Videos:

6.1 - Writing your own functions (why, when and how)
6.2 - Introduction to CRAN & BioConductor
6.3 - Quick introduction to RMarkdown
6.4 - Optional longer video: Getting started with RMarkdown

Hands-on Lab:

Lab: Hands-on section worksheet
Lab supplement: Hands-on section supplemental information
Feedback: Muddy point assessment

Supporting material:

Lecture Slides: Pt1. Large PDF / Pt2. Large PDF
Screencast Lab: video walk-through
Bonus: Introductory tutorial on R packages
Office/Student Hours (Marcos): In-person on Fri at 3pm & Zoom on Mon at 12pm
Office/Student Hours (Josef): Zoom on Mon at 4pm

Homework:

Quiz
Submit your completed PDF lab report to GradeScope
Write a function: See Q6 of the hands-on lab supplement above. This entails turning a supplied code snippet into a more robust and re-usable function that will take any of the three listed input proteins and plot the effect of drug binding. Note assessment rubric and submission instructions within document
DataCamp: Please complete the Introduction to ggplot2 course

Other:

Flat files for practicing importing with read.table: test1.txt / test2.txt / test3.txt

7: Introduction to machine learning for bioinformatics

Topics:
Unsupervised learning, supervised learning, and reinforcement learning. Focus on unsupervised learning. K-means clustering. Hierarchical clustering. Dimensionality reduction, visualization, and analysis. Principal Component Analysis (PCA). Practical considerations and best practices for the analysis of high dimensional datasets.

Goals:

Understand the major differences between unsupervised and supervised learning.
Be able to create k-means and hierarchical cluster models in R.
Be able to describe how the k-means and bottom-up hierarchical cluster algorithms work.
Know how to visualize and integrate clustering results and select good cluster models.
Be able to describe in general terms how PCA works and its major objectives.
Be able to apply PCA to high dimensional datasets and visualize and integrate PCA results (e.g., identify outliers, find structure in features and aid in complex dataset visualization).

Videos:

Hands-on Lab:

Lab: Hands-on section worksheet
Feedback: Muddy point assessment.

Supporting material:

Lecture Slides: Large PDF / Small PDF
WebApp: Introduction to PCA
Data files: UK_foods.csv / expression.csv
Screencast Lab: video walk-through
Bonus: StackExchange discussion on PCA
Office/Student Hours (Marcos): In-person on Fri at 3pm & Zoom on Mon at 12pm
Office/Student Hours (Josef): Zoom on Mon at 4pm

Homework:

Submit your completed PDF lab report to GradeScope
DataCamp: Introduction to the Tidyverse (~4hrs)

8: Unsupervised learning mini-project

Topics:
Hands-on project session with unsupervised learning analysis of cancer cells. Practical considerations and best practices for the analysis and visualization of high dimensional datasets.

Goals:

Be able to import and prepare data for unsupervised learning analysis.
Be able to apply and test combinations of PCA, k-means, and hierarchical clustering to high dimensional datasets and critically review results.

Hands-on Lab:

Lab: Mini-Project
Feedback: Muddy point assessment

Supporting material:

Data file: WisconsinCancer.csv / new_samples.csv
Lecture Slides: Large PDF / Small PDF

Homework:

Submit your completed PDF lab report to GradeScope
DataCamp: Complete Introduction to the Tidyverse (~4hrs)

9: Structural bioinformatics

Topics:
Protein structure function relationships. Protein structure and visualization resources. Modeling energy as a function of structure. Homology modeling. AlphaFold. Predicting functional dynamics. Inferring protein function from structure.

Goals:

View and interpret the structural models in the PDB.
Understand the classic Sequence > Structure > Function via energetics and dynamics paradigm.
Be able to use VMD for biomolecular visualization and analysis.
Appreciate how AlphaFold has advanced structural bioinformatics.
Be able to use the Bio3D package for exploratory analysis of protein sequence-structure-function-dynamics relationships.

Videos:

Hands-on Lab:

Lab: Hands-on section
Feedback: Muddy point assessment

Supporting material:

Lecture Slides: Large PDF / Small PDF
Software links: VMD download

Homework:

Submit your completed PDF lab report to GradeScope

10: Halloween candy mini-project

Topics:
A fun and topical mini-project with unsupervised learning analysis of halloween_candy. Practical considerations and best practices for the analysis and visualization of high dimensional datasets.

Hands-on Lab:

Lab: Mini-Project

Homework:

Submit your completed PDF lab report to GradeScope

11: Genome informatics

Topics:
Genome sequencing technologies past, present and future (Sanger, Shotgun, PacBio, Illumina, toward the $500 human genome). Biological applications of sequencing. Variation in the genome. RNA-Sequencing for gene expression analysis. Major genomic databases, tools, and visualization resources from the EBI & UCSC. The Galaxy platform for quality control and analysis. Sample Galaxy RNA-Seq workflow with FastQC and Bowtie2.

Goals:

Appreciate and describe in general terms the rapid advances in sequencing technologies and the new areas of investigation that these advances have made accessible.
Understand the process by which genomes are currently sequenced and the bioinformatics processing and analysis required for their interpretation.
For a genomic region of interest (e.g. the neighborhood of a particular SNP), use a genome browser to view nearby genes, transcription factor binding regions, epigenetic information, etc.
Be able to use the Galaxy platform for basic RNA-Seq analysis from raw reads to expression value determination.
Understand the FASTQ file format and the information it holds.
Understand the SAM/BAM file format and the information it holds.

Videos:

11.1 - Introduction to genomics
11.2 - Sequencing methods from Jonathan Weissman (UCSF)
11.3 - The basics of RNASeq work-flows
11.4 - Optional: Lessons from the Human Genome Project

Hands-on Lab:

Lab: Hands-on section worksheet
Feedback: Muddy point assessment

Supporting material:

Lecture Slides: Large PDF / Small PDF
Screencast Lab: video walk-through
Galaxy Server, create a free account for section 3 of the lab
RNA-Seq data files: HG00109_1.fastq / HG00109_2.fastq / genes.chr17.gtf / Expression genotype results
SAM/BAM file format description
Office/Student Hours (Marcos): In-person on Fri at 3pm & Zoom on Mon at 12pm
Office/Student Hours (Josef): Zoom on Mon at 4pm

Homework:

Submit your RMarkdown/Quarto generated PDF lab report answering questions Q1 to Q12 to GradeScope

[OPTIONAL] Extra Credit:

Population analysis: Submit to GradeScope your RMarkdown/Quarto generated PDF with working code, output and narrative text answering Q13 and Q14 in this week’s hands-on section worksheet

12: Transcriptomics and the analysis of RNA-Seq data

Topics:
Analysis of RNA-Seq data with R. Differential expression tests. RNA-Seq statistics. Counts and FPKMs. Normalizing for sequencing depth. DESeq2 analysis. Gene finding and functional annotation from high throughput sequencing data. Functional databases KEGG, InterPro, GO ontologies, and functional enrichment.

Goals:

Given an RNA-Seq dataset, find the set of significantly differentially expressed genes and their annotations.
Gain competency with data import, processing and analysis with DESeq2 and other Bioconductor packages.
Understand the structure of count data and metadata required for running analysis.
Be able to extract, explore, visualize and export results.
Perform a GO analysis to identify the pathways relevant to a set of genes (e.g. identified by transcriptomic study or a proteomic experiment). Use both Bioconductor packages and online tools to interpret gene lists and annotate potential gene functions.

Videos:

Hands-on Lab:

Lab: Hands-on section worksheet
Feedback: Muddy point assessment

Supporting material:

Lecture Slides: Large PDF / Small PDF
Detailed Bioconductor setup instructions
Screencast Lab: video walk-through
Data files: airway_scaledcounts.csv / airway_metadata.csv
Office/Student Hours (Marcos): In-person on Fri at 3pm & Zoom on Mon at 12pm
Office/Student Hours (Josef): Zoom on Mon at 4pm

Readings:

Excellent review article: Conesa et al. A survey of best practices for RNA-seq data analysis. Genome Biology (2016)
An oldey but a goodie: Soneson et al. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Research (2015)
Good review article: Trapnell et al. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol (2013)
Abstract and introduction sections of: Himes et al. RNA-Seq transcriptome profiling identifies CRISPLD2 as a glucocorticoid responsive gene that modulates cytokine function in airway smooth muscle cells. PLoS ONE (2014)

Homework:

Submit your completed PDF lab report answering questions Q1 to Q10 to GradeScope

13: RNA-Seq analysis mini-project

Topics:
Differential expression analysis mini-project with DESeq2 followed by gene enrichment and functional annotation with KEGG, InterPro, and GO ontologies.

Hands-on Lab:

Lab: Mini-Project
Feeback Muddy point assessment

Supporting material:

Data files: GSE37704_featurecounts.csv / GSE37704_metadata.csv

Homework:

Submit your completed PDF lab report to GradeScope
DataCamp: Intermediate R (~4hrs)

14: Hands-on with Git and GitHub

Topics:
Why you should use a version control system. How to perform common operations with Git, currently the most popular version control system. Creating and working with your own GitHub repos and navigating and using those of others. Syncing bioinformatics work to GitHub.

Videos:

14.1 - OPTIONAL: Git for humans

Hands-on Lab:

Lab: Hands-on section worksheet
Feedback: Muddy-Point-Assessment

Supporting material:

Lecture Slides: Large PDF
Resource for going further: Happy Git with R

Homework:

Submit your GitHub class repository URL to GradeScope
DataCamp: Complete Intermediate R (~4hrs)

15: Essential UNIX for bioinformatics

Topics:
Bioinformatics on the command line. Why do we use UNIX for bioinformatics? UNIX philosophy. 21 Key commands. Understanding processes. File system structure. Connecting to remote servers. Redirection, streams and pipes. Workflows for batch processing. Organizing computational projects. Going further with your own computer in the cloud. Launching and using AWS EC2 instances (or virtual machines).

Goals:

Understand why we use UNIX for bioinformatics.
Use UNIX command-line tools for file system navigation and text file manipulation.
Have a familiarity with 21 key UNIX commands that we will use ~90% of the time.
Be able to connect to remote servers from the command line.
Use existing programs at the UNIX command line to analyze bioinformatics data.
Understand IO redirection, streams and pipes.
Understand best practices for organizing computational projects.

Videos:

15.1 - Essential UNIX for bioinformatics I
15.2 - Essential UNIX for bioinformatics II
15.3 - Manipulating files on UNIX machines
15.4 - UNIX superpowers: using pipes and conecting to remote machines

Hands-on Lab:

Hands-on section worksheet
- (Part I) Starting your own computer in the cloud
- (Part II) Accessing and using your AWS instance
AWS Console URL: https://awsed.ucsd.edu/
Feedback: Muddy point assessment

Supporting material:

Lecture Slides: Large PDF / Small PDF
Screencast Lab: launching an AWS EC2 instance
Office/Student Hours (Marcos): In-person on Fri at 3pm & Zoom on Mon at 12pm
Office/Student Hours (Josef): Zoom on Mon at 4pm

Homework:

Questions: complete PDF form with your answers, save, and submit to GradeScope
No lab report due this week
Bonus: Introduction to the Unix shell (~4hrs)

16: Analyzing sequencing data in the cloud mini-project

Topics:
A mini-project using AWS EC2 cloud computing resources to query, download, decompress, and analyze large data sets from the NCBI’s Sequence Read Archive (SRA). Practical considerations and best practices for installing bioinformatics software on Linux, transfering large data sets, and performing analysis either locally or on AWS.

Hands-on Lab:

Lab: Mini-Project

Supporting material:

AWS Console URL: https://awsed.ucsd.edu/

Homework:

Submit your completed PDF lab report to GradeScope

17: Vaccination rate mini-project

Topics:
A topical mini-project using ggplot and dplyr with the latest state wide COVID-19 vaccination data. Practical considerations and best practices for exploratory data analysis.

Hands-on Lab:

Lab: Mini-Project

Supporting material:

Data files: Statewide COVID-19 vaccines administered by ZIP code (updated: 2023/05/23)
Original source of the COVID-19 vaccination data at the California Open Data Portal

Homework:

Submit your completed PDF lab report to GradeScope

[OPTIONAL] Extra Credit:

DataCamp: Introduction to the Unix Shell (~4hrs)

18: Mutational signatures of cancer mini-project

Topics:
A mini-project focusing on cancer genomics resources and bioinformatics tools for investigating the molecular basis of cancer. Large scale cancer sequencing projects. The Cancer Genome Atlas project, the cBioPortal platform, and the COSMIC database. De novo extraction and assignment of mutational signatures.

Hands-on Lab:

Lab: Mini-Project

Supporting material:

Lecture Slides:
- (Part I) Large PDF
- (Part II) Large PDF
The cBioPortal platform
Mutational matrices:

Homework:

Find-a-gene project due on Tuesday, June 6.

[OPTIONAL] Extra Credit:

Submit your completed PDF lab report to GradeScope
DataCamp: Introduction to the Unix Shell (~4hrs)

19: Pertussis resurgence mini-project

Topics:
A topical mini-project using web-scraping, JSON based APIs and advanced dplyr and ggplot to investigate brand new datasets associated with pertussis cases and longitudinal RNA-Seq on the immune response to distinct vaccination strategies.

Hands-on Lab:

Lab: Mini-Project

Supporting material:

Additional resources: CDC pertussis tracking data / The CMI-PB resource

Homework:

Submit your completed PDF lab report to GradeScope

20: GitHub pages online portfolio building and discussion of bioinformatics in industry

Topics:
Course summary and review. Making a public facing GitHub pages portfolio of your bioinformatics work. Discussion of bioinformatics and genomics career opportunities. Interview with leading bioinformatics and genomics scientists from industry.

Lecture:

Lecture Slides: Large PDF

Feedback:

Course evaluation:
- Ether-pad version
- Alternative form version

Bioinformatics in industry session:
Discussion of bioinformatics and genomics career opportunities with four leading bioinformatics and genomics scientists from industry:

Live session (1:30pm): Recording
- Dr. Sebastià Franch-Expósito (Senior Translational Scientist at Tempus Lab, Inc.)
- Uma Mahto (Product Manager at Guardant Health)
- Dr. Daniela Nachmanson (Bioinformatics Scientist II at TwinStrand Biosciences)
Prerecorded: Prerecorded interview
- Jason Dai (Bioinformatics scientist at Thermo Fisher Scientific)

Videos:

20.1 Live stream interview with leading bioinformatics and genomics scientists from industry including Dr. Ali Crawford (Associate Director, Scientific Research at Illumina Inc.), Dr. Bjoern Peters (Full Professor and Principal Investigator at La Jolla Institute), and Dr. Ana Grant (Director of Research Informatics at Synthetic Genomics Inc.).

Supporting material:

DataCamp: Bioinformatics Extension Track

Homework:

Submit your GitHub page URL to GradeScope