R可视化:iris数据探索
前言
Kaggle数据挖掘竞赛里有一个经典的探索性分析例子,对iris数据集进行了各种形式的可视化,帮助人通过直观的图形更深地理解特征与label的关系。Kaggle官网给出了Python版本的实现,链接如下:
https://www.kaggle.com/benham...
本文用R对该notebook的代码进行重现。
代码
library(tidyr)
library(dplyr)
library(ggplot2)
library(grid)
library(GGally)
Let's see what's in the iris data
head(iris)
Let's see how many examples we have of each species
summary(iris$Species)
Make scatter plot of Sepal.Length and Sepal.Width
p.scatter % gather(feature_name, feature_value, one_of(c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")))) + geom_boxplot(aes(x=Species, y=feature_value)) + facet_wrap(~feature_name)
p.box.facet
Parallel coordinate graph & Andrews Curve
修改自:http://cos.name/2009/03/parallel-coordinates-and-andrews-curve/
轮廓图的思想非常简单、直观,它是在横坐标上取n个点,依次表示各个指标(即变量);横坐标上则对应各个指标的值(或者经过标准化变换后的值),然后将每一组数据对应的点依次连接即可
调和曲线图的思想和傅立叶变换十分相似:
根据三角变换方法将 n 维空间的点映射到二维平面上的曲线上,其中x取值范围为[-pi,pi]。
Another multivariate visualization technique pandas has is parallel_coordinates
Parallel coordinates plots each feature on a separate column & then draws lines
connecting the features for each data sample
p.paral % gather(feature_name, feature_value, one_of(c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"))), id=1:nrow(iris))) + geom_line(aes(x=feature_name, y=feature_value, group=id, colour=Species))
p.paral
One cool more sophisticated technique pandas has available is called Andrews Curves
Andrews Curves involve using attributes of samples as coefficients for Fourier series
and then plotting these
andrews_curve %
gather(x, y, -label, -id, convert = TRUE)
}
iris.andrew
关键字:r, ggplot2
版权声明
本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处。如若内容有涉嫌抄袭侵权/违法违规/事实不符,请点击 举报 进行投诉反馈!