Knowledge Preparation

To be successful in the Data Science program, one needs to have knowledge from three pillars, namely Mathematics, Statistics and Computing Science, as well as domain knowledge. Data Science is such a new discipline and it is extremely rare for incoming students to have a thorough background in all three areas mentioned here. We will select students who we feel could most likely fill the expected knowledge gaps over a short period of time (e.g., self-study in the summer months, online resources like a few videos, short courses or expand knowledge points from past courses) and succeed in the program thereafter.

Please note that the four courses mentioned in the Admission Requirements are the bare minimum for admission to this program. To be successful in the data science program, and to be ready in later research projects, students are encouraged to beef up their readiness in three pillars. For this purpose, we provide a list of topics below as guidance for students to check their readiness in knowledge requirements.

1. Mathematics Pillar:

Linear algebra:

• vector space proofs
• matrix inversion theorems
• diagonalization/decompositions
• orthogonalization and projections
• matrix equation

Calculus:

• single and multivariate integral and differential calculus

2. Statistics Pillar (typically 2-3 upper level stat courses):

• introduction to descriptive statistics
• basic probability:  expectation, total variables, double expectation, moment generating function
• introduction to central tendency
• introduction to regression
• introduction to sampling and experimental designs
• matrix and differential solutions to least squares (simple and multiple linear regression)
• model diagnostics, model selection
• derivations of common distributions (e.g., poisson, t-, chi-square, gamma)
• theory and applications of various test statistic and confidence interval construction
• maximum likelihood topics
• Bayesian methods including derivations
• likelihood ratio tests
• likelihood ratio test proofs
• proof of the central limit theorem
• Note: Click here to download a pdf file containing a list of more detailed on-line resources in topics of regression and inference.

3.  Computing Science (database and algorithms):

• basic methods of representing data in CS
• implement and analyze fundamental data structures, e.g., lists, stacks, queues, and graphs
• implementation of algorithms using data structures
• cost tradeoffs of each of data type
• database concepts
• database design techniques, using entity relationship model and object oriented approach to designing database systems
• data description language, data manipulation language (updates, queries, reports), and data integrity