Data Challenges

Data science is rapidly gaining importance in both commercial and governmental settings and data analytic expertise is needed to address a broad range of challenges in areas including economics, law, social sciences and the humanities. The data challenges USE learning line teaches students how to set up scientifically sound research using existing and available data. This includes the formulation of the research question, the selection of the data, the evaluation of ethical, legal, and societal issues, and the selection of the right tools for analyzing the dataset. This is important in both scientific and business environments, as errors in the use, analysis, and interpretation of data sets may have severe consequences, including financial and reputational losses.  Students will need to think independently and creatively to obtain and defend their results scientifically. They will also learn to communicate their findings both to specialists in the field, as well as a non-specialist audience.

The objective of this multi-disciplinary and integrative series of courses is to teach students how to perform large-scale data-driven correct analyses themselves, using technical and societal skills acquired in their major. Prerequisites to this learning line are Data engineering, USE basis, statistics, and programming.

The data challenges are taught jointly by Eindhoven and Tilburg, which means that half the educational activities take place in Tilburg. In the academic year 2017-2018, there are only a limited number of places for students who are not enrolled in the Data Science major.

Exploration, specialization, application

The Data Challenges require students to solve real-world problems, using large data sets from different domains, such as business, economics, law, health, social sciences, culture, and education. The data sets will be acquired from external partners, who also have the role of stakeholders. The Data Challenges are organized in a collaborative-competitive format, in which groups (6-8 students) are encouraged to tackle the problem from different perspectives and to challenge each other in the spirit of a friendly competition (gamification). The courses follow a design-based learning approach, with plenary meetings and group sessions.

In each course, students work through all phases of exploration, specialization, and application. The courses have increasing complexity, and address also entrepreneurial, legal, and ethical questions throughout. Along the way, students may acquire additional technical skills that are related to obtaining data (for instance extracting data from databases or data wrangling), performing the analysis (for instance data analysis based on mathematical modeling or machine learning), or creating (interactive) visualizations.

Data Challenge 1 lets students independently solve a well-defined analysis problem. Students have to identify and execute a suitable analysis approach by themselves. They will learn how a high-level analysis question (asked by a particular stakeholder) can adequately be refined into several concrete sub-questions that yield results that answer the original question (for the original stakeholder). The current idea is to use the Dutch NSE data where marketing departments of the TU/e and Tilburg are the stakeholders.

Data Challenge 2 lets students identify a well-defined problem in a larger problem landscape so that the problem is solved for a particular (external) problem owner. Conducting an analysis that is for the purposes of the problem owner and communicating the results to the problem owner, i.e., a non-technical audience, are central elements. The focus is on gaining proficiency in taking external (USE) factors into account when conducting data analysis. A dataset will be provided and there are multiple stakeholders having an interest in the dataset. Students pick the perspective of one of the stakeholders in the analysis.

Data Challenge 3 lets students combine all of the previous elements by letting them explore a complex data set provided by an external stakeholder. They will be provided with an open analysis task to learn as much as possible about the data, and they have to identify a well-defined problem (from different technical and USE angles) from a large problem space which is then solved and answered for the stakeholder. The students will recognize, describe, select, and apply the techniques needed to solve the research questions, such as obtaining data from different sources, applying particular analysis techniques, and/or visualizing results.  They identify and report about technical, ethical, legal, and entrepreneurial aspects of their project.

Schedule

Data challenge 1:             Year 2; Q2

Data challenge 2:             Year 2; Q3

Data challenge 3:             Year 3; Q1