不平衡数据集的度量指标

为什么要做特征选择,为了降维

熵(entropy)用来衡量系统的不确定性

Branch and bound 分支定界

PCA(Principal Component Analysis),PCA不考虑分类

LDA(Linear Discriminant Analysis),LDA的目标是在减小维数的同时保留尽可能多的类之间的区别

最大化类之间的距离,最小化每一类的散度

PCA计算实例

LDA计算实例

LDA的缺陷,对于C分类问题LDA最多能降到C-1维,当两组数据的均值接近时并不能很好的区分。

论文学习

  • 数据清洗
    M. A. Hernandez and S. J. Stolfo, “Real-World Data is Dirty: Data Cleansing and The Merge/Purge Problem,” Data Mining and Knowledge Discovery, vol. 2, pp. 9–37, 1998.
  • 数据缺失
    A. Donders, G. van der Heijden, T. Stijnen, and K. Moons, “Review: A Gentle Introduction to Imputation of Missing Values,” Journal of Clinical Epidemiology, vol. 59, pp. 1087-1091, 2006.
  • 不平衡数据集
    N. Japkowicz and S. Stephen, “The Class Imbalance Problem: A Systematic Study,” Intelligent Data Analysis, vol. 6, pp. 429–449, 2002.
    N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-Sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.
  • 数据可视化
    D. Keim, “Information Visualization and Visual Data Mining,” IEEE Transactions on Visualization and Computer Graphics, vol. 8, pp. 1-8, 2002.
说点什么
支持Markdown语法
好耶,沙发还空着ヾ(≧▽≦*)o
Loading...