Salta ai contenuti. | Salta alla navigazione

Strumenti personali

BIOINFORMATICS AND GENOME ANALYSIS

Academic year and teacher
If you can't find the course description that you're looking for in the above list, please see the following instructions >>
Versione italiana
Academic year
2015/2016
Teacher
SILVIA FUSELLI
Credits
6
Didactic period
Primo Semestre
SSD
BIO/18

Training objectives

Genomics studies the contents, the structure, the expression and the evolution of the genetic material coding for the structures and the respective functions of living organisms, that is inherited from generation to generation. This course explores that branch of the bioinformatics that allows to analyse “in silico” the high-throughput outputs of genetics and genomics sequencing projects using methodologies of new generation.
Knowledge and understanding
The course is aimed to
- provide knowledge of the structure and organization of prokaryotic and eukaryotic genomes
- provide knowledge of the strategies and techniques used to study different genomes
- teach how to retrieve and interpret information form the most important biological databases
- provide theoretical elements of bioinformatics and computational genomics
The practical activities will allow the students to obtain basic knowledge of Linux operating system, which is needed to analyze the big data produced by the genomes sequencing projects.
Ability to apply knowledge and understanding
The students will be able to
- design a prokaryotic or eukaryotic genome analysis, design the study of a reduced representation of a genome
- retrieve information from the most important biological databases and extract genes, genomes, and, partly, protein data useful to design experiments or to analyse data from experiments of genomic sequencing
During the laboratory the students will go through the bioinformatic workflow of a shotgun genome sequencing project. By applying the most commonly used bioinformatic tools, they will learn how to extract the biologically meaningful information from raw data.

Prerequisites

No formal propedeuticity. The bioinformatic analysis of genomic data requires good knowledge of genetics, in particular of the laws of inheritance and of the mutational mechanisms. Good knowledge of molecular biology is required, and in particular of the nucleic acids duplication, transcription and translation. Basic knowledge of biostatistics.

Course programme

Frontal lectures and informatic laboratory.

The central dogma and the molecules of inheritance (4h)
Nucleic acids and genetic code. Amino acids and their substitutions. Mutation as a source of variation. Different kind of mutations, definitions, functional effect of synonymous and nonsynonymous changes

Sizes and organization of genomes (8h)
¿ Prokaryotic and eukaryotic genomes. Chromosomes: structure, numbers, ploidy, K, N e C paradoxes.
¿ Genes: traditional and extended definitions, the ENCODE project. Simple and complex genomes: how many genes e and functional regions; gene expression and epigenetics.
¿ The human genome: an example of a complex eukaryotic genome. Contents of the human genome; description and definition of genomic variation; characterizing human genomic variation: international projects; examples of genomic regions coding for the proteome: extremely short and long genes, gene families (moderately repetitive DNA).

Methods for genome analysis, Next Generation Sequencing (NGS) (10h)
¿ Frederick Sanger and the development of DNA sequencing.
¿ Second-generation sequencing methods (NGS or High Throughput Sequencing). Library preparation, controls and quantification, sequencing and signal detection.
¿ Third-generation sequencing methods (Single Molecule Real Time Technology and Nanopore sequencing)

Metagenomics: new perspective in the ecology field (4h)
¿ How to use new sequencing technologies to study environmental samples. Definitions and examples. From samples to sequences: standard workflow of metagenomic analysis.
¿ Barcoding and metabarcoding.

Comparing sequences: pairwise and multiple sequence alignments (6h)
¿ Alignments: why? Similarity and homology; global and local alignments.
¿ Alignment algorithms: substitution matrices of DNA and proteins, gap penalties; exhaustive and heuristic algorithms (Needleman-Wunsch, Smith-Waterman, FASTA, BLAST)

Searching sequences in biological databases and Basic Local Alignment Search Tool (BLAST) (4h)
¿ Biological databanks (computer exercises to learn how to search specific databases): National Center for Biotechnology Information (NCBI); ENSEMBL. Students are required to bring their own laptops, 12 laptops are available.
¿ How to use BLAST: practical exercises.

Bioinformatics and Next Generation Sequencing (4+12h)
Student will use a computer to analyze sequencing data from High Throughput Sequencing (Illumina technology). In particular, we will explore the pipelines based on common bioinformatics tools to covert raw data files into biologically meaningful information.
In detail:
basis of BASH, interpreter to run Linux operating system (4h); from raw data (sequences in FASTQ format) to variable sites (vcf format). Standard programs and modules will be run to get through the following steps: FASTQ quality control + trimming; alignment to a reference genome; bam refinement; bam check and visualization; variant calling; variant filtering and validation(12h).

Didactic methods

Frontal lectures and informatics laboratory. The course is structured in 52 hours (6 CFU): 40 hours of frontal lectures and 12 hours of practice in lab. Lectures are provided on a weekly basis in class with power-point slides, videos showing NGS technologies, biological databases and alignment algorithms are described and explored with online exercises.
During the laboratory students are divided in groups (30 students per group). Currently, 15 desktop computers are booted from USB keys with Linux operating system where all the programs and files needed for the projects are available. Two students work together on a single desktop. The virtualization of the informatics system is ongoing. This will allow to directly access Linux from each terminal, and 20 terminals will be available.

Learning assessment procedures

The aim of the exam is to verify at which level the learning objectives have been acquired. The exam is divided in three parts that take place the same day. A minimum score of 6/10 is required to pass each part of the exam. The exam is passed if each one of the 3 tests is sufficient.
Part I: 4 questions (multiple choices, short open questions). This part is aimed to verify the basic knowledge and understanding of genomics, sequencing methods, metagenomics, sequence alignments. Time: 1 hour.
Part II: searching biological databanks and use of BLAST. Using a computer online, the student should do 4 or 5 exercises to retrieve nucleotide or protein sequences from databases starting from key words or accession numbers. Specific questions on the retrieved sequences are asked. Similarly, the correct BLAST algorithm should be used to answer specific questions (searching for homologous genes or proteins from sequences). Searching parameters should be described. Time: 1 h and 30 minutes.
Part III: in front of a computer (oral exam), the student will show his/her knowledge of the Linux file system and ability to use BASH commands. Few questions will be asked on the analysis workflow of the practical part of the course, specifically the reason to run specific software modules and why this allows to obtain a better result. The knowledge of the main file formats is required. This part lasts about 15 minutes.

Reference texts

Both genomics and bioinformatics are constantly updated, thus no text book can be considered exhaustive. For the first part of the course the first 8 chapters of the book by Arthur M Lesk “Introduction to genomics” Second edition, Oxford University Press could be useful. The rest of the course (i.e. databanks and sequencing methods) should be studied using online resources indicated by the teacher. The pdf version of the slides shown during the lectures are available as well as all the useful links to online resources. Examples of practical exercises with solutions for the second part of the exam are available on the teacher website.