Data Mining course page

 

 

Mario Martin

Email: mmartin@cs.upc.edu

http://www.cs.upc.edu/~mmartin/DM.htm

 

Location

Office #202

Omega building, Campus Nord

 

Attention time:

Tuesday: 9:00-10:00

Thursday: 12:00-14:00

For other hours, contact by e-mail

 

 

Material

 

Slides:

DM1 – Supervised Learning: Concepts, and evaluation

DM2 – Data reduction

DM3 – Decision Trees

DM4 – Naive Bayes and KNN

DM5 – Support Vector Machines

DM6 – Meta-Methods

DM7 – Association Rules

 

 

Laboratory:

 

Project

Here

Guidelines

 

Software

            Poll of most used data mining tools 2018. [Older polls: 2017 and 2016]

 

Weka

Knime

Rapidminer

 

SciKit-learn

Python graph gallery

 

Scripts

Pre-processing with pandas

Python Notebook for KNN

Python Notebook for Preprocessing in KNN

Python Notebook for Naive Bayes

Python Notebooks for Decision Trees

Meta-methods demonstration in python

SVMs notebook

Notebook explaining techniques for unbalanced datasets

 

Rapidminer workflow for KNN and grid search

 

Toy data for feature selection:

            FSnormal.arff : Normal data with only two lasts features relevant

            foo.csv : Normal data with only two lasts features relevant

            FSbool.arff : Boolean data with nonlinear relation of the tree first features

 

Data

UCI KDD and UCI repository

KDNudgets

Kaggle

CMU StatLib

BigML

MLData

DataMob

Hilary Mason’s Dataset collection