Outline

  • Quantitative methods for data mining: qualitative and quantitative variables, samples and populations, descriptive statistics, central tendency, dispersion, bivariate data, probability, binomial distributions.

  • Coding tools for data mining: from simple home-made programs to large, community-supported packages: Python and its Pandas, Seaborn, Matplotlib, and scikit-learn packages.

  • Visualization concepts and styles.

  • Simple visualization within Seaborn and matplotlib.

  • Data curation and preparation.

  • Data stores and warehouses.

  • Association rules.

  • Classification and clustering.

  • Rules, decision trees, Knn, K-means.

  • Neural networks.

Course Map

The course is divided into two parts: the first part covers tools and methods necessary for datamining. The second part covers the theory and the techniques of datamining. (Slide deck with course map).

Weekly maintenance meetings

Additional support for this asynchronous course is available every Monday, 6-7 PM, via videoconference. During these weekly sessions, I can help with coding, troubleshooting stuff, choosing datasets to analyze, etc. Attendance is optional and you may come and go as you please, to as many of these sessions as you wish. I only ask that when you join the video session your camera is on.

For privacy, the details of the meeting are posted via email only.