This
course discusses quantitative methods for analyzing "big data", i.e.
data sets that have either many observations or many variables or both. Firstly,
the course covers flexible or "nonparametric" econometric methods for
data with many observations, where "flexible" implies that the
researcher aims at imposing as few behavioral assumptions as possible. These
methods are often more accurate than standard approaches such as OLS, which
assumes a linear relation between the explanatory and dependent variables that
might not hold in reality.
Secondly, the course discusses so-called "machine learning"
approaches to deal with data that include many variables, in order to optimally
exploit the vast information provided in variables. Separating relevant from
irrelevant information is key in a world with ever increasing data
availability.
The following topics will be covered in the course:
* Flexible (non/semiparametric) vs. parametric statistical (or econometric) models
* Nonparametric regression methods: Kernel regression, series approximation,
smoothing splines
* Methods for choosing smoothing and bandwidth parameters
* Testing: nonparametric specification and distribution tests
* Machine learning based on shrinkage and variable selection: Lasso and ridge
regression
* Machine learning based on decision trees, bagged trees, and random forests
* Introduction to further machine learners: boosting, support vector machines, neural
nets, and ensemble methods
The lecture is accompanied by 4 PC sessions based on the software package
"R", in which the methods are applied to empirical data.
- Enseignant·e: Martin Huber
- Enseignant·e: Sarina Joy Oberhänsli
- Enseignant·e: Andreas Stoller