Salta ai contenuti. | Salta alla navigazione

Strumenti personali

ADVANCED BIOSTATISTICS

Academic year and teacher
If you can't find the course description that you're looking for in the above list, please see the following instructions >>
Versione italiana
Academic year
2015/2016
Teacher
ANDREA BENAZZO
Credits
6
Didactic period
Secondo Semestre
SSD
SECS-S/01

Training objectives

The course aims to provide the skills necessary for the statistical analysis of complex biological processes and the interpretation of experimental data. In particular, students will learn theoretical details about linear association between variables and how to detect it using standard statistical tools. The course is also aimed to provide knowledge of the likelihood concept, the linear regression between numerical variable, the general linear models and how to use them to to detect the effect of particular factors on biological variables. Model assumption will be discussed along with methods useful when assumptions are violated. Theoretical knowledge will be applied to solve realistic biological problems computing solutions by hand or using the R software and a personal computer.
The student will acquire the knowledge about te logical rationale behind the main statistical methodologies and how to apply it to experimental biological data. Moreover, through computer laboratory sessions, the student will learn how to practically apply the theoretical concepts to real biological data.

Prerequisites

No preparatory course is required. However, theoretical elements of advanced biostatisics require good knowledge of basic elements of statistics. Moreover, their application to biological data requires a basic knowledge of how to use a computer.

Course programme

Frontal lectures (40 hours, 5CFU) and informatics laboratory (12 hours, 1CFU) describing the following arguments:
Introduction to R (6 hours): description of the environment for the statistical analysis and of its basic commands, description and creation of vectors, matrix and data frames, simple and complex arithmetical functions, how to create and modify plots.
Linear correlations between numerical variables (8 hours): The correlation coefficient estimate, Hypothesis testing, Main assumptions, Nonparametric correlation.
The regression (8 hours): Linear regression concept, quality of expectations, slope hypothesis testing, main assumptions, variable transformations, Measurement error, Non-linear regression.
General linear models (8 hours): linear regression and anova in the general linear model framework, Testing the effect of one or multiple factors, Factorial design analysis, Covariates, Main assumptions.
Computer intensive methods (8 hours): Hypothesis testing using simulations, The randomization test, Bootstrap.
Likelihood (8 hours): the likelihood concept, Maximum likelihood estimate, Likelihood ratio test
Final exercises (6 hours): Guided solution of several exercises covering all the arguments.

Didactic methods

The course is composed by theoretical frontal lectures (40 hours) and practical sessions (12 hours) in the computer room, for 52 hours in total. Each lecture is provided using power-point slides and the blackboard for the explanation of theoretical concepts. At the end of each main topic, theoretical knowledge will be applied to solve realistic biological problems using the R software and a personal computer.

Learning assessment procedures

The exam is divided in two parts: written and practical. The exam will be hosted in the computer room. The first part is composed by 20 questions including short open questions, multiple choice and small exercises that not require a calculator. Two points will be attributed for each correct answer to short open question and one point for each multiple choice and execise. In the second part (practical), the student will be in front of a personal computer, dealing with a word document containing the description of two biological problems, the associated dataset and some questions. To pass the exam, the student will have to answer to at least half of the questions. The student will be required to answer the questions using the R software. The two parts last one hour each.

Reference texts

M.C. Whitlock, D. Schluter Analisi dei dati biologici, Zanichelli, Bologna, 2010.
M. Pagano, K. Gauvreau, Biostatistica, Idelsono-Gnocchi, Napoli, 2003.
Slides and experimental data related to the exercises related to each lecture, are available at the lecturer internet page.