DataScience/Interview.md at master · fujunswufe/DataScience

#######Resume

Yelp Data Challenge
1. Use FEYNMAN Technique to explain Word2Vec, TF-IDF and LDA
2. explain TextRank and Sentence Clustering
3. explain ROUGE and BLUE
Springleaf Marketing Response
1. the process of preprocessing
2. Feature enginnering, especially feature ranking. However, for tree learning method, it is unnecessary to do correlation
3. explain the principle of XGBoost
Twitter Answerers
1. Use a few sentences to explain Neo4j
2. how to model the twitter data
3. what is probability model, smoothing problem.
4. Natural Language Processing with Graph Databases
Keyword Search on Neo4j
1. RDBMS to Neo4j
2. REST API and Maven
3. how to do keyword search

###面试准备

###Hadoop&Spark###

###Machine Learning###

参考Data Scientist找工小记
掌握regression, classification and clustering
完整掌握Logistic Regression and Naive Bayes的区别(what is generative model and discriminant model)
完整掌握 XGBoost and SVM
summary
1. AUC为什么对于数据不平衡问题不敏感
2. 比如什么情况下Random Forests 比Gradient Boosting好，什么情况下不如GBM

###NLP###

Provide feedback