Salta ai contenuti. | Salta alla navigazione

Strumenti personali

DATA MINING AND ANALYTICS

Academic year and teacher
If you can't find the course description that you're looking for in the above list, please see the following instructions >>
Versione italiana
Academic year
2022/2023
Teacher
FABRIZIO RIGUZZI
Credits
6
Didactic period
Secondo Semestre
SSD
INF/01

Training objectives

The main goal of the course consists in enabling the students to analyze data stored in databases with tools of increasing complexity either descriptive or predictive.
The main acquired knowledge are relative to:
- data analytics
- knowedge discovery in databases,
- data mining
- machine learning
The basic acquired abilities (that are the capacity of applying the acquired knowledge) are:
- descriptive data analytics,
- predictive data analytics.

Prerequisites

The following concepts and knowledge (for example provided by the courses "Databases", "Computer Science" and “Foundations of Artificial Intelligence”) must have been acquired:
- relational data model,
- SQL data manipulation and query language,
- procedural programming languages (Java, C),
- logic programming languages.

Course programme

The course is composed of 60 hours of teaching partly in the classroom and partly in the laboratory.
Introduction to data mining (7.5 hours): probability theory recall, introduction to learning, concept learning and the general to specific ordering.
Decision trees, propositional rule learning and Instance-based learning (10 hours).
Bayesian networks (7.5 hours): inference and learning.
Kernel methods, neural networks and deep learning (15 ore).
First order rule learning (5 hours).
Logical-probabilistic languages (7.5 hours): inference and learning.
Descriptive data mining (7.5 hours): clustering, association rules.

Didactic methods

The course is composed of 60 hours of teaching partly in the classroom and partly in the computer laboratory.
The lectures will cover all the course topics and will include guided exercises on the computer.
Exercises in the laboratory will cover the use of the Weka system to solve machine learning and data mining problems and the use of rule induction systems.

Learning assessment procedures

The aim of the exam is to verify at which level the learning objectives previously described have been achieved.
The examination is composed of a written test and a test on theory.
The written test consists of four exercises on the course topics. The test lasts two hours. It is worth 17 points. It is allowed to use teaching material.
The test on theory consists of three questions on the theoretical topics of the course. It is worth 15 points. The use of teaching material is not allowed.
The final mark is given by the sum of the marks in the two parts. The exam is passed if the marks in the written and theory tests are both at least 9.
The two parts can be taken in different exam dates.

Reference texts

The reference texts are:
Teacher's handouts.
Fabrizio Riguzzi, “Foundations of Probabilistic Logic Programming”. River Publishers 2018.
T. M. Mitchell, “Machine Learning”, McGraw-Hill, 1997
Ian Witten, Eibe Frank, “Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations”, Second Edition Morgan Kaufmann Publishers, 2005
Hal Daumé III, A Course in Machine Learning, http://www.ciml.info/
Texts for futher reading:
Luc De Raedt, “Logical and Relational Learning”, Springer, Series: Cognitive Technologies, 2008
I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio. “Deep learning”, volume 1. MIT Press, 2016.
Alessandro Rezzani, “Big Data -Architettura, tecnologie e metodi per l’utilizzo di grandi basi di dati”, Apogeo Education, 2013
Matteo Golfarelli, Stefano Rizzi, “Data Warehouse, Teoria e pratica della progettazione”, McGraw-Hill, 2006
Luc De Raedt, Kristian Kersting, Sriraam Natarajan, and David Poole, “Statistical Relational Artificial Intelligence: Logic, Probability, and Computation”, Morgan & Claypool, 2016
Daphne Koller, Nir Friedman, “Probabilistic graphical models: principles and techniques”, MIT Press, 2009
J. Ross Quinlan: “c4.5: Programs for machine learning”, Morgan Kaufmann Publishers, 1992
N. Lavrac and S. Dzeroski, “Inductive Logic Programming Techniques and Applications”, Ellis Horwood, 1994, http://www-ai.ijs.si/SasoDzeroski/ILPBook/