Algorithmic Data Analysis

Algorithms play a fundamental role in Data Science: they enable efficient automated data handling, analysis and visualization. Typical data-science problems are often time-sensitive and complex, and not even necessarily well-defined. To develop algorithmic solutions for such challenges which deliver high- quality results in a verifiable, and hence also explainable, manner, a broad set of algorithmic tools is a necessary and important part of the repertoire of each data scientist.

The trajectory contains the following courses: 

  • 2IMA30 - Topological Data Analysis,
  • 2IMA20 - Algorithms for Geovisualization,
  • 2AMS50 - Optimization for Data Science

One of the key messages of Topological Data Analysis (2IMA30) is that data has shape and the shape matters. Extraction of information from datasets that are high-dimensional, incomplete, and noisy is generally challenging. Topological data analysis (TDA) provides a general framework to analyze such data in a manner that is independent of the particular underlying space and robust to noise. This course covers the basics of computational topology that underlie TDA techniques, as well as applications of TDA to various data-analysis problems.

A significant part of today's data is geographic, has a geographic component (is geo-referenced), or benefits from geographic interpretation. Such data is often visualized to support analysis. However, automatically computing visualizations from data faces often ill-defined problems that require advanced algorithmic tools.  Algorithms for Geovisualization (2IMA20) focuses on modeling and solving such ill-defined problems in geovisualization (or automated cartography), that is, to compute high-quality visualizations (maps) of geographic data. The course also occasionally side-steps to other, related topics using geographic data or in information visualization. 

Mathematical optimization, in particular linear, integer and non-linear optimization, is increasingly applied to inform decision making. Improved algorithms, as well as increased computational power, allows large-scale problems to be solved to (near) optimality. Such optimization is a final step in the process which starts by determining what is happening (descriptive analytics) and what the likely outcomes of decisions are (predictive analytics). The next step is to apply these mathematical optimization techniques to determine which combination of actions leads to the best outcome and improve decisions (prescriptive analytics). Optimization for data science (2AMS50) gives students an overview of some important classes of problems that can be tackled with mathematical programming, as well as the knowledge to tackle this through the use of dedicated solvers.