Skip to content

Latest commit

 

History

History
44 lines (35 loc) · 1.81 KB

File metadata and controls

44 lines (35 loc) · 1.81 KB

#######Resume

  1. Yelp Data Challenge

    1. Use FEYNMAN Technique to explain Word2Vec, TF-IDF and LDA
    2. explain TextRank and Sentence Clustering
    3. explain ROUGE and BLUE
  2. Springleaf Marketing Response

    1. the process of preprocessing
    2. Feature enginnering, especially feature ranking. However, for tree learning method, it is unnecessary to do correlation
    3. explain the principle of XGBoost
  3. Twitter Answerers

    1. Use a few sentences to explain Neo4j
    2. how to model the twitter data
    3. what is probability model, smoothing problem.
    4. Natural Language Processing with Graph Databases
  4. Keyword Search on Neo4j

    1. RDBMS to Neo4j
    2. REST API and Maven
    3. how to do keyword search

###面试准备

  1. SQL Tutorial on w3Cschools (20 hours)
  2. Probability and statistics (CS229 Prob notes)
  3. Linear Algebra (CS229 notes)
  4. LintCode (10 questions per day), 刷题攻略参考

###Hadoop&Spark###

  1. mainly on spark
  2. spark on edx

###Machine Learning###

  1. 参考Data Scientist找工小记
  2. 掌握regression, classification and clustering
  3. 完整掌握Logistic Regression and Naive Bayes的区别(what is generative model and discriminant model)
  4. 完整掌握 XGBoost and SVM
  5. summary
    1. AUC为什么对于数据不平衡问题不敏感
    2. 比如什么情况下Random Forests 比Gradient Boosting好,什么情况下不如GBM

###NLP###