-
Generalized Linear Model 广义线性模型 (GLM)
-
Suport Vector Machine 支持向量机 SVM
- Linear Kernel 线性核
- Polynomial Kernel 多项式核
- Radial Basis Function/Gaussian Kernel 高斯核 RBF
-
Neural Network 神经网络 (NN)
-
Bayesian Models 贝叶斯模型
- Naive Bayes 朴素贝叶斯 (NB)
- Bayesian Network/Belief Network/Directed Acyclic Graphical model 贝叶斯网络/信念网络/有向无环图模型
-
Decision Trees 决策树
- ID3
- C4.5
- Classification and Regression Tree 分类回归树 (CART)
-
Ensemble 模型组合
- 线性组合
- Bootstrap aggregating (Bagging) -> Random Forests 随机森林 (RF)
- Boosting 提升
- Adaptive Boosting 自适应提升 (AdaBoost) -> Boosting Tree 提升树
- Gradient Boosting -> Gradient-Boosted Regression Trees 梯度提升回归树 (GBRT/GBDT)
- L2 Boosting
- Logit Boosting
- Cascade
-
K-means K-均值
-
DB-SCAN
-
Gaussian Mixture Model 混合高斯模型 (GMM)
-
Power Iteration Clustering (PIC)
-
Association Rules
-
FP-growth
-
PrefixSpan
Feature Engineering 特征工程
-
Feature Construction 特征构建
-
Feature Extraction 特征提取
-
Feature Selection 特征选择
- Filter 过滤式方法
- Coefficient Score 相关系数
- Chi-squared Test 卡方检验
- Mutual Information/Information Gain 互信息/信息增益
- Wrapper 封装式方法
- Complete 完全搜索
- Heuristic 启发式搜索
- Random 随机搜索
- Embedded 嵌入式方法
- 正则化
- 决策树
- 深度学习
- Filter 过滤式方法
Model Evaluation 模型评价
-
Model Validation 模型验证
- Hold-out Validation
- K-fold cross-validation K折交叉验证
- Leave one out/Jackknife 留一交叉验证/刀切法
- Bootstrapping 自助法
-
Model Testing 模型测试
- A/B Testing
Model Selection 模型选择
-
Feature Engineering
-
Algorithm Selection
-
Hyperparameter Tuning 超参数调优
- Grid Search 格搜索
- Random Search 随机搜索
- Smart Search 智能搜索
- Derivative-free optimization
- Bayesian optimization
- random forest smart tuning
-
Content Filtering
-
Collaborative Filtering 协同过滤
- Neighborhood Methods
- Item-oriented
- User-oriented
- Latent Factor Models
- Neighborhood Methods
Topic Models 主题模型
-
Latent Semantic Indexing 潜语义索引 (LSI)
-
Probability Latent Semantic Indexing 概率潜语义索引 (pLSI) [SIGIR 1999]
Sequence Labeling 序列标注
-
Hidden Markov Model 隐马尔科夫模型 (HMM)
-
Maximum Entropy Markov Model 最大熵马尔科夫模型 (MEMM)
- Label Bias Problem 标注偏置问题
-
Markov Random Field 马尔科夫随机场 (MRF)
-
Conditional Random Field 条件随机场 (CRF)
-
AutoEncoder 自动编码器
- Sparse AutoEncoder 稀疏自动编码器
- Denoising AutoEncoders 降噪自动编码器
-
Sparse Coding 稀疏编码
-
Restrict Boltzmann Machine 限制波尔兹曼机 (RBM)
-
Deep Belief Networks 深信度网络
-
Convolutional Neural Networks 卷积神经网络
-
Underfitting vs. Overfitting 欠拟合与过拟合
- Bias vs. Variance 偏差与方差
-
Empirical Risk Minimization vs. Structural Risk Minimization 经验风险与结构风险 (ERM vs. SRM)
-
Regularization 正则化
- Ridge Regression 岭回归
- Least Absolute Shrinkage and Selection Operator 最小绝对值收敛和选择算子算法 LASSO
-
Normalization 归一化
-
Learning Curve 学习曲线
-
Discriminative Model vs. Generative Model 判别式模型与生成式模型
-
Parametric Model vs. Nonparametric Model 参数模型和非参数模型
-
Eigenvalue Decomposition 特征值分解
-
Singular Value Decomposition 奇异值分解 (SVD)
-
Low Rank Matrix Decomposition 低秩矩阵分解
- Stochastic Gradient Descent
- Alternating Least Squares (ALS)
-
Probability Distributions 概率分布
- Conjugate Prior 共轭先验
- Beta distribution and Binomial distribution
- Dirichlet distribution and Multinomial distribution
- Exponential Family 指数族
- Gaussian Distribution
- Binomial Distribution
- Poisson Distribution
- Gamma Distribution
- Exponential Distribution
- Beta Distribution
- Dirichlet Distribution
- Conjugate Prior 共轭先验
-
Parameter Estimation 参数估计方法