#######Resume
-
Yelp Data Challenge
- Use FEYNMAN Technique to explain Word2Vec, TF-IDF and LDA
- explain TextRank and Sentence Clustering
- explain ROUGE and BLUE
-
Springleaf Marketing Response
- the process of preprocessing
- Feature enginnering, especially feature ranking. However, for tree learning method, it is unnecessary to do correlation
- explain the principle of XGBoost
-
Twitter Answerers
- Use a few sentences to explain Neo4j
- how to model the twitter data
- what is probability model, smoothing problem.
- Natural Language Processing with Graph Databases
-
Keyword Search on Neo4j
- RDBMS to Neo4j
- REST API and Maven
- how to do keyword search
###面试准备
- SQL Tutorial on w3Cschools (20 hours)
- Probability and statistics (CS229 Prob notes)
- Linear Algebra (CS229 notes)
- LintCode (10 questions per day), 刷题攻略参考
###Hadoop&Spark###
- mainly on spark
- spark on edx
###Machine Learning###
- 参考Data Scientist找工小记
- 掌握regression, classification and clustering
- 完整掌握Logistic Regression and Naive Bayes的区别(what is generative model and discriminant model)
- 完整掌握 XGBoost and SVM
- summary
- AUC为什么对于数据不平衡问题不敏感
- 比如什么情况下Random Forests 比Gradient Boosting好,什么情况下不如GBM
###NLP###