- Collect data that is necessary to the ML implementation
- Clean collected data as necessary
- Collaborate with the ML Engineer in the data analysis, e.g. for detecting useful patterns into the datasets
- A degree in mathematics, statistics, data science or related discipline
- Dominates statistical tests, distributions, maximum likelihood estimators, etc. with specific reference to machine learning methods like k-nearest neighbors, random forests, ensemble methods and more.
- Ability to understand when different techniques are (or aren’t) a valid approach.
- Ability to deal with imperfections in data (missing values, inconsistent string formatting, etc.).
- Knowledge of popular scientific libraries in Java or Scala (Deeplearning4j, Breeze, Saddle, Spark MLlib, PredictionIO, etc.)
- Ability to communicate your findings in plain English and / or with the use of visualization tools (charts, maps, etc.)
- Ability to work collaboratively in a fast-paced environment.