R Cheetsheets

基本

基本元素、函数

基本元素

函数名 功能
points(x,y) 添加点
lines(x,y) 添加线
text(x,y,labels,…) 在(x,y)处添加用labels指定的文字
mtext(text,side=3,line=0,…) 在边空添加用text指定的文字
segments(x0,y0,x1,y1) 从(x0,y0)各点到(x1,y1)各点画线段
arrows(x0,y0,x1,y1,…) 同上,但添加箭头
abline(a,b) 绘制斜率为b和截距为a的直线
abline(h=y) 在纵坐标y处画水平线, y可以是向量, abline(h=c(1,2,3))
abline(v=x) 在横坐标x处画垂直线, x可以是向量
abline(lm.obj) 画出lm.obj确定的回归线
rect(x1,y1,x2,y2) 绘制长方形,(x1,y1)为左下角,(x2,y2)为右上角
polygon(x,y) 绘制连接各x,y坐标确定的点的多边形
legend(x,y,legend) 在点(x,y)处添加图例,说明内容由legend给定
title() 添加标题,也可添加一个副标题
axis(side,vect) 画坐标轴
box() 在当前的图上加边框
rug(x) 在x轴上用短线画出x数据的位置
locator(n,type=”n”,…) 获取鼠标在图中点击处的坐标

常用绘图函数

函数名 功能
plot(x) 以x的元素值为纵坐标、以序号为横坐标绘图
plot(x,y) x与y的二元作图
pie(x) 饼图
boxplot(x) 盒形图(也称箱线图)
hist(x) x的频率直方图
barplot(x) x的值的条形图
pairs(x) 如果x是矩阵或是数据框,作x的各列之间的二元图
matplot(x,y) 二元图,其中x的第一列对应y的第一列,依次类推
qqnorm(x) 正态分位数-分位数图
image(x,y,z) x,y,z三元图
heatmap(x) 热图
coplot(x~y|z) 关于z的每个数值(或数值区间)绘制x与y的二元图

常见图,例子

ggplot2 doc

柱状图、饼图、玫瑰图、图例、坐标系、主题、颜色等汇总

R/ggplot2基础查询

柱状图、分类柱状图、簇形图、百分比堆积图、饼图极坐标图、折线图

地图

ggplot概念理解

用ggplot2画qqplot

ggplot2叠加多图层

安装包方法

1
2
3
source("http://www.bioconductor.org/biocLite.R")
# 可以一次装多个包
biocLite(c("gridExtra","ggplot2"))

几何性质相关

alpha, color, fill, linetype, shape, size

shape

image

linetype:

0 = blank, 1 = solid, 2 = dashed, 3 = dotted, 4 = dotdash, 5 = longdash, 6 = twodas

数据操作

矩阵操作
合并

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# 合并两个矩阵
# 纵向合并
cbind(pop,admix)
# 横向合并
rbind(a,b)
#横向合并
ID<-c(1,2,3,4)
name<-c("Jim","Tony","Lisa","Tom")
score<-c(89,22,78,78)
student1<-data.frame(ID,name)
student2<-data.frame(ID,score)
total_student<-merge(student1,student2,by="ID")
total_student
#纵向合并
ID<-c(1,2,3)
name<-c("Jame","Kevin","Sunny")
student1<-data.frame(ID,name)
ID<-c(4,5,6)
name<-c("Sun","Frame","Eric")
student2<-data.frame(ID,name)
total<-rbind(student1,student2)
total

取出部分行、列

1
2
3
4
5
6
7
# 取出1,2,3行、列
admixname=admixsort[c(1,2,3),] # 行
admixpatial=admixsort[,c(1,2,3)] # 列
# 去掉1,2,3行、列
admixpatial=admixsort[-c(1,2,3),] # 行
admixpatial=admixsort[,-c(1,2,3)] # 列

截取、排序、构造新列

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# 截取大于x的所有数值
a=rnorm(10)
a=a[a>0]
# 排序
对于矩阵pop
针对第3列排序:
pop2=pop[order(pop[,3]),] # 从小到大
pop2=pop[order(-pop[,3]),] # 从大到小
先按第3列,再按第4列排序,再按第5列排序:
pop2=pop[order(pop[,3],pop[,4],pop[,5]),]
如果想以按照第3列从小到大,如果第3列一样,按照第4列从大到小排列:
pop2=pop[order(pop[,3],-pop[,4]),]
# R重新构造一列
a$V3=paste(a$V1,a$V2,sep="-")

ggplot 跳过NA数据

1
2
3
4
ggplot(data=subset(a, !is.na(alcohol_AUDIT)), aes(x=alcohol_AUDIT))+geom_histogram()
# na.rm=T
mean(a$pressure_height,na.rm=T)

stat, pos, binwidth, scale

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
stat_identity
stat_bin
stat='bin' # 默认
stat='identity' #
pos='stack' # 不同类/组的X叠加起来,默认
pos='fill' # 将每个x值的数据归一化到1
pos='dodge' # 独立开来
binwidth=30
# scale
scale_x_log10(...)
scale_y_log10(...)
scale_x_reverse(...)
scale_y_reverse(...)
scale_x_sqrt(...)
scale_y_sqrt(...)
scale_x_datetime()
scale_x_date()
scale_x_discrete()

标签

labs, guides

1
2
3
4
5
6
7
8
9
10
+ labs(title="Title", x='Weight', y='Y:mpg')
+ guides(color=guide_legend(title="Yan Se", title.pos = 'bottom'),size=guide_legend(title='Size',reverse=TRUE))
# guides也可以在相应的scale, theme函数里
+ scale_color_brewer(palette='Set1', guide=guide_legend(title="Yan Se", title.pos = 'bottom'))
# guides 分几列显示
+guides(color=guide_legend(ncol=2))
+guides(shape=guide_legend(ncol=2))
+guides(fill=guide_legend(ncol=2))

单独画坐标、竖线(admixture图里面)

1
2
3
4
5
position=c(0,tapply(1:nrow(admixname),admixname[,3],max))
# 坐标
axis(1,at=position,labels=F)
# 一组竖线
abline(v=position,lwd=0.2)

将绘图区划分成 m 行 n 列

用facet把图分成小图,多图

1
2
3
4
5
6
7
8
9
10
11
12
13
+ facet_wrap(~cut)
+ facet_wrap(~cut, ncol=1)
+ facet_wrap(~cut+clarity, ncol=4)
+ facet_wrap(~clarity+cut, ncol=4, scales="free")
# facet_grid与facet_wrap可以实现相同效果,
+ facet_wrap(~gear, nrow=1)
+ facet_grid(.~gear)
+ facet_wrap(~gear, ncol=1)
+ facet_grid(gear~.)
+ facet_grid(clarity~cut)

facet_wrap()说明

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
facet_wrap(facets, nrow = NULL, ncol = NULL, scales = "fixed",
shrink = TRUE, as.table = TRUE, drop = TRUE)
**Arguments**
facets
一般是“~变量名”或者“~变量一+变量二”
nrow
控制总行数
ncol
控制总列数
scales
这个变量控制不同图像间坐标的尺度关系。"fixed"是指多个坐标轴使用相同的尺度,这个是默认值; "free"是指根据当前坐标轴的内容确定尺度,"free_x"和"free_y"是根据x或者y的数据范围确定当前坐标轴的尺度。
as.table
这个控制最大值的变量的位置,TURE是默认值,意思是最大值在右下角;可以设置成FALSE,那么最大值将在右上角。
drop
这个如果是真会自动删除没有数据的坐标轴。默认是真。

mfrow方法

par(mfrow=c(m,n))

1
2
3
4
5
# 划分成 1行 2列
par(mfrow=c(1,2))
# 直接在原图上叠加
par(new=TRUE)

gridExtra方法,可以和ggplot一起使用

1
2
3
4
5
library(grid)
library(gridExtra)
b=ggplot(a,aes(pos,value))+geom_point(col=1)
c=ggplot(a,aes(pos,value))+geom_point(col=2)
grid.arrange(b,c,ncol=2,nrow=2) # 4个小图,2x2排布

常用图例子

点图 point

应用(PCA)

input file:“input.ggplot2”:

1
2
3
4
5
6
name PC1 PC2 species
sample1 1.10 2.039 D
sample2 0.83 1.928 D
sample3 0.678 0.837 W
sample4 1.828 1.282 W
...

geom_point()

1
2
3
4
5
6
7
8
a=read.table("input.ggplot2",header=T);
mycolor=c("#BF812D","#35978F")
ggplot(data,aes(V2,V3,color=factor(V6)))+geom_point(size=8,alpha=0.8)+xlab("PC1(25.0%)")+ylab("PC2(12.0%)")+theme_bw()+theme(legend.title=element_blank(),panel.grid =element_blank(), panel.grid.major=element_blank())+scale_color_manual(values=mycolor)
# OR
p=ggplot(a,aes(PC1,PC2,color=species))+geom_point(alpha=0.9,size=4)
p+theme_bw()+scale_color_manual(values=c(\"#2166ac\",\"#b2182b\"))
1
2
# 用地域名字/数字来表示点
ggplot(a,aes(C1,C2,color=info))+geom_point(shape=20,size=0.000000001)+geom_text(aes(label=info), size=2)

特殊例子,同时设定颜色color和形状,同时只显示一个legend

scale_colour_manualscale_shape_manual中使用同样的namelabels
link

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# 数据
state1 <- c(rep(c(rep("N", 7), rep("Y", 7)), 2))
year <- rep(c(2003:2009), 4)
group1 <- c(rep("C", 14), rep("E", 14))
group2 <- paste(state1, group1, sep = "")
beta <- c(0.16,0.15,0.08,0.08,0.18,0.48,0.14,0.19,0.00,0.00,0.04,0.08,0.27,0.03,0.11,0.12,0.09,0.09,0.10,0.19,0.16,0.00,0.11,0.07,0.08,0.09,0.19,0.10)
lcl <- c(0.13,0.12,0.05,0.05,0.12,0.35,0.06,0.13,0.00,0.00,0.01,0.04,0.20,0.00,0.09,0.09,0.06,0.06,0.07,0.15,0.11,0.00,0.07,0.03,0.05,0.06,0.15,0.06)
ucl <- c(0.20,0.20,0.13,0.14,0.27,0.61,0.28,0.27,0.00,1.00,0.16,0.16,0.36,0.82,0.14,0.15,0.13,0.13,0.15,0.23,0.21,0.00,0.15,0.14,0.12,0.12,0.23,0.16)
data <- data.frame(state1,year,group1,group2,beta,lcl,ucl)
# 画图
library(ggplot2)
pd <- position_dodge(.65)
ggplot(data = data,aes(x= year, y = beta, colour = group2, shape = group2)) +
geom_point(position = pd, size = 4) +
geom_errorbar(aes(ymin = lcl, ymax = ucl), colour = "black", width = 0.5, position = pd) +
scale_colour_manual(name = "Treatment & State",
labels = c("Control, Non-F", "Control, Flwr", "Exclosure, Non-F", "Exclosure, Flwr"),
values = c("blue", "red", "blue", "red")) +
scale_shape_manual(name = "Treatment & State",
labels = c("Control, Non-F", "Control, Flwr", "Exclosure, Non-F", "Exclosure, Flwr"),
values = c(19, 19, 17, 17))

image

手动指定颜色、形状,例子:

1
ggplot(a,aes(PC1,PC2,shape=Regions,color=Regions))+geom_point(size=2)+scale_color_manual(name="test",labels=c("AWD","EMZ","NQA","RDA","MODA","BCS","GCN","BMN","BSI","AME","GME","DOP","TEX","BGE","GUR","GAR","SUM","AFS","AW","CC","IROA","KRS","KR","NDZ","SKZ","AWT","ALT","BAS","BAY","DUL","HET","KAZ","QIB","TUB","GLT","HUL","HUS","JZS","LLT","LOP","LTH","LZM","SNT","SSF","STH","TAN","THF","TON","UJI","WDS","YFT","CHA","DQS","GBF","HZS","MBF","OLA","TCS","TIB","TIBQ","TIBS","WNS","LPB","NLB","SPG","ZTS","IROO","IROV","MPS","OCAN","ODAL"),values=c("#E41A1C","#E41A1C","#E41A1C","#E41A1C","#E41A1C","#377EB8","#377EB8","#377EB8","#377EB8","#984EA3","#984EA3","#984EA3","#984EA3","#FF7F00","#FF7F00","#FF7F00","#FF7F00","#A65628","#A65628","#A65628","#A65628","#A65628","#A65628","#A65628","#A65628","#A65628","#b2df8a","#b2df8a","#b2df8a","#b2df8a","#b2df8a","#b2df8a","#b2df8a","#b2df8a","#4DAF4A","#4DAF4A","#4DAF4A","#4DAF4A","#4DAF4A","#4DAF4A","#4DAF4A","#4DAF4A","#4DAF4A","#4DAF4A","#4DAF4A","#4DAF4A","#4DAF4A","#4DAF4A","#4DAF4A","#4DAF4A","#4DAF4A","#e7298a","#e7298a","#e7298a","#e7298a","#e7298a","#e7298a","#e7298a","#e7298a","#e7298a","#e7298a","#e7298a","#4eb3d3","#4eb3d3","#4eb3d3","#4eb3d3","#F781BF","#000000","#000000","#000000","#000000"))+scale_shape_manual(name="test",labels=c("AWD","EMZ","NQA","RDA","MODA","BCS","GCN","BMN","BSI","AME","GME","DOP","TEX","BGE","GUR","GAR","SUM","AFS","AW","CC","IROA","KRS","KR","NDZ","SKZ","AWT","ALT","BAS","BAY","DUL","HET","KAZ","QIB","TUB","GLT","HUL","HUS","JZS","LLT","LOP","LTH","LZM","SNT","SSF","STH","TAN","THF","TON","UJI","WDS","YFT","CHA","DQS","GBF","HZS","MBF","OLA","TCS","TIB","TIBQ","TIBS","WNS","LPB","NLB","SPG","ZTS","IROO","IROV","MPS","OCAN","ODAL"),values=c(1,2,3,4,5,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,1,2,3,4,5,6,7,8,9,10,11,1,2,3,4,1,2,3,4,5))+theme_bw()+theme(plot.background=element_blank(),panel.background=element_blank(),legend.background=element_blank(),legend.position='right',panel.grid =element_blank(),legend.title = element_blank(),axis.title =element_text(size=9,color="black"),axis.text=element_text(size=8,color="black"))+xlab("Principal component 1")+ylab("Principal component 2")+guides(shape=guide_legend(ncol=5),color=guide_legend(ncol=5))

image

plot & points

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 用不同的颜色区分不同群体
d1=read.table("D1.txt",header=T);
d2=read.table("D2.txt",header=T);
w=read.table("W.txt",header=T);
d1x=d1$PC1
d1y=d1$PC2
d2x=d2$PC1
d2y=d2$PC2
wx=w$PC1
wy=w$PC2
pdf(file="PC12.pdf",width=2,height=2)
plot(d1x,d1y,pch=4,col="#89c876",cex=2,cex.lab=1.5,lwd=2,xlab="PC1 (3.24%)",ylab="PC2 (1.36%)",xlim=c(-0.3,0.1),ylim=c(-0.6,0.2))
points(d2x,d2y,pch=4,col="#2166ac",cex=2,lwd=2)
points(wx,wy,pch=4,col="#b2182b",cex=2,lwd=2)
dev.off()

barplot 柱状图

barplot

1
2
3
4
5
6
7
8
9
10
# admixture
# k2.txt
0.000000001 0.999999999
0.029249134 0.970750866
0.000000001 0.999999999
8.65071E-08 0.999999913
0.000000001 0.999999999
1.49306E-08 0.999999985
0.043220336 0.956779664
0.04214767 0.95785233
1
2
admix=t(as.matrix(read.table("k2.txt",header=F)))
barplot(admix,col=c("#2166ac","#b2182b"),space=0,border=NULL ,xlab="Individuals",ylab="K=2",cex.lab=2,yaxt='n')

barplot for admixture results example:

1
2
3
4
5
6
7
8
9
# 02.poplist2
...
# small_pop_id sample_id small_pop_order big_pop_id big_pop_order
TIB SRR501845 60 Dom_EA_TIB 9
TIB SRR501863 60 Dom_EA_TIB 9
BGE SRR501854 27 Dom_SA 6
BGE SRR501873 27 Dom_SA 6
GAR SRR501902 26 Dom_SA 6
...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
par(mfrow=c(10,1), mar=c(0,3,1.5,0.5), las = 0, mgp=c(0.5, 0.5, 0))
color=c("#E41A1C","#377EB8","#4DAF4A","#984EA3","#FF7F00","#A65628","#F781BF")
pop<-read.table("02.poplist2",as.is=T)
admix<-read.table("input.2.Q",as.is=T)
admixmerge=t(as.matrix(cbind(pop,admix)))
admixsort=admixmerge[,order(admixmerge[3,],admixmerge[6,],admixmerge[7,])]
admixpatial=admixsort[-c(1,2,3,4,5),]
admixname=t(admixsort[c(1,2,3,4,5),])
barplot(admixpatial,col=color[1:2],space=0,border=NA,xlab="",ylab="K=2",yaxt='n')
# small population axis
#position=c(0,tapply(1:nrow(admixname),admixname[,3],max))
#axis(1,at=position,labels=F)
# small population line
position=c(0,tapply(1:nrow(admixname),admixname[,3],max))
abline(v=position,lwd=0.2)
# big population line
position2=c(0,tapply(1:nrow(admixname),admixname[,5],max))
abline(v=position2,lwd=1)
# name
barplot(admixpatial,col=c("white"),space=0,border=NA,xlab="",ylab=" ",xaxt='n', yaxt='n')
text(tapply(1:nrow(admixname),admixname[,3],mean),0.8,unique(admixname[,1]),xpd=T,srt=90,cex=0.45)
barplot(admixpatial,col=c("white"),space=0,border=NA,xlab="",ylab=" ",xaxt='n', yaxt='n')
text(tapply(1:nrow(admixname),admixname[,5],mean),0.8,unique(admixname[,4]),xpd=T,srt=0,cex=0.45)

image

geom_bar

给定数据的柱状图
http://blog.csdn.net/bone_ace/article/details/47267981
http://shujuren.org/article/44.html

1
2
# x轴为文本标签时,加上 stat='identity'
ggplot(a,mapping=aes(x=V1,y=V2,fill=V3))+geom_bar(stat='identity')+scale_x_discrete(limits=c("1/302","2/302","3/302","4/302","5/302","6/302","7/302","8/302","9/302","10/302","11/302","12/302","13/302","14/302","15/302","16/302","17/302","18/302","19/302","20/302","21/302","22/302","23/302","24/302","25/302","26/302","27/302","28/302","29/302","30/302","0.1-0.2","0.2-1"))+theme_bw()+theme(panel.border=element_blank(),panel.grid =element_blank(),legend.title=element_blank(),axis.text.x=element_text(angle =90))+scale_fill_manual(values=c("#1F78B4","#FF7F00"))+labs(x="Allele frequency",y="SNP number (M)")

geom_histogram()

1
2
3
4
5
6
7
pdf(file="B1.final.cnv.V3.pdf")
library("ggplot2")
a <- read.table("B1.final.cnv",head=F)
# logx
ggplot(a,aes(x=log(a$V3)/log(10)))+geom_histogram()+xlab("CNV length (log10)")+ylab("Count")+xlim(3,7)
# ggplot(a,aes(x=a$V3))+geom_histogram()+xlab("CNV length (log10)")+ylab("Count")
dev.off()
1
2
3
4
5
6
7
8
9
# histogram纵坐标变成density,默认是count
ggplot(qcctau,aes(x=V1,y=..density..,fill=factor(V2)))+ geom_histogram(position='dodge')
# position (比如直方图)
dodge 并列
fill 标准化堆叠
identity 重叠
jitter 使用jitter方式保证不重叠
stack 堆叠

小提琴图 geom_violin()

1
2
geom_violin()
# 参数:adjust调整平滑程度,scale="count"使得图的面积与每组观测值数目成正比

boxplot 箱线图

boxplot

1
2
3
par(mfrow=c(1,2))
boxplot(bb1$V2,bc1$V2,cc1$V2,outline = F,names=c("B-B","B-C","C-C"),main="1 bp overlap")
boxplot(bb5$V2,bc5$V2,cc5$V2,outline = F,names=c("B-B","B-C","C-C"),main="50% reciprocal overlap")

geom_boxplot

1
2
3
4
5
6
7
8
9
10
# 根据 V3 大小值排序(从小到大)
ggplot(a,aes(x=reorder(V1,V3),y=V3)+geom_boxplot()
# 手动调整顺序
ggplot(b,aes(V1,V4/V5))+geom_boxplot()+xlab("Individuals")+ylab("HET/HOM")+theme( legend.position = "none")+scale_x_discrete(limits=c("BRM","GIR","LQC","NEL","LXC","NYC","QCC","YBC","YNC","FLV","HOL","JBC","JER","LIM","RAN"))
# x、y轴转换,横着的boxplot
+coord_flip()
# 参数 width,outlier.size,outlier.shape

点+erro bar

1
2
3
# 如ABBA-BABA test结果
# range: Mean-3*SE ~ Mean+3*SE
+geom_pointrange(aes(ymin=V4-3*V5,ymax=V4+3*V5))

pheatmap 热图

输入格式 input.txt

1
2
3
4
5
6
7
8
9
10
11
12
B1 B2 B3 B4 B5 B6 C1 C2 C3 C4 C5
B1 1 0.624833110814419 0.616958399636281 0.648058868307536 0.63029315960912 0.628120893561104 0.615183929037308 0.632938076416337 0.637414484548242 0.6375 0.616299232103047
B2 0.624833110814419 1 0.593837535014006 0.65446941975954 0.676940319417204 0.659620596205962 0.653577192038731 0.665760869565217 0.612027158098933 0.626946847960445 0.615502294747578
B3 0.616958399636281 0.593837535014006 1 0.574553571428571 0.538953350698556 0.558471454880295 0.563129002744739 0.557452699584679 0.606948514022604 0.575867205788466 0.572989510489511
B4 0.648058868307536 0.65446941975954 0.574553571428571 1 0.645949535192563 0.629439011837365 0.625958099131323 0.636738906088751 0.62962962962963 0.649846734260788 0.627489072365226
B5 0.63029315960912 0.676940319417204 0.538953350698556 0.645949535192563 1 0.675117112152108 0.667213563029806 0.676982591876209 0.611863155303963 0.635542168674699 0.632996632996633
B6 0.628120893561104 0.659620596205962 0.558471454880295 0.629439011837365 0.675117112152108 1 0.68078348332451 0.652941176470588 0.60755258126195 0.627040194884287 0.650426921145153
C1 0.615183929037308 0.653577192038731 0.563129002744739 0.625958099131323 0.667213563029806 0.68078348332451 1 0.64968152866242 0.611585944919278 0.631986450520203 0.650872817955112
C2 0.632938076416337 0.665760869565217 0.557452699584679 0.636738906088751 0.676982591876209 0.652941176470588 0.64968152866242 1 0.625778629611883 0.652991452991453 0.64551863041289
C3 0.637414484548242 0.612027158098933 0.606948514022604 0.62962962962963 0.611863155303963 0.60755258126195 0.611585944919278 0.625778629611883 1 0.612910332672395 0.581521739130435
C4 0.6375 0.626946847960445 0.575867205788466 0.649846734260788 0.635542168674699 0.627040194884287 0.631986450520203 0.652991452991453 0.612910332672395 1 0.682038275305511
C5 0.616299232103047 0.615502294747578 0.572989510489511 0.627489072365226 0.632996632996633 0.650426921145153 0.650872817955112 0.64551863041289 0.581521739130435 0.682038275305511 1

例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
a=read.table("input.txt”)
a=as.matrix(a)
# 默认聚类
pdf("out.cluster.pdf",width=10,height=10,onefile=FALSE)
pheatmap(a,cellwidth = 18, cellheight = 18, fontsize=12, fontsize_row=9,main="title")
dev.off()
# 聚类,但是不显示树
treeheight_row = 0, treeheight_col = 0
# 聚类,同时把聚类的分开画,自己设定分几份
cutree_rows=3, cutree_cols=3
# 不聚类
cluster_rows=0, cluster_cols=0
# 显示格子里的数字
display_numbers=T
# 设置其他样式
fontsize_number=14 # 格子里的文字大小
fontsize=14 # 标签文字大小
border_color="White" # 边界颜色
# 设定分组annotation
annotation_col = data.frame(Type = factor(c(rep(c("item"), 10),"sum")))
rownames(annotation_col) = c(paste("item", 1:10, sep = ""),"PSS")
annotation_row = data.frame(Type = factor(c(rep(c("item"), 10),"sum")))
rownames(annotation_row) = c(paste("item", 1:10, sep = ""),"PSS")
pheatmap(a, display_numbers=T, fontsize_number=14,border_color="White", cutree_rows=3, cutree_cols=3,annotation_col = annotation_col, annotation_row = annotation_row)

其他例子

1
2
3
4
5
6
7
8
9
10
# 最简单地直接出图
pheatmap(data)
# 改变排序算法
pheatmap(data, scale = "row", clustering_distance_row = "correlation")
# 自定义颜色
pheatmap(data, color = colorRampPalette(c("navy", "white", "firebrick3"))(50))
# 关闭图例
pheatmap(data, legend = FALSE)
# 设定格子的尺寸
pheatmap(data, cellwidth = 6, cellheight = 5)

ggplot 调整作图细节

保存文件

1
2
3
pdf(file="plot.allpop.chr.pl.pdf",width=10,height=10)
ggplot ... # 具体图形
dev.off()

自动加label

1
2
3
library(ggrepel)
ggplot(mtcars,aes(wt, mpg,color='red'))+geom_point() +geom_text_repel(aes(wt,mpg, label=rownames(mtcars)))

image

添加注释

添加文本注释

1
2
# 可以加多个
annotate("text", x=3, y=48, label="Group 1") +annotate("text", x=4.5, y=66, label="Group 2")

添加数学表达式注释

1
2
+ annotate("text", x=2, y=0.3, parse=TRUE,label="frac(1, sqrt(2 * pi)) * e ^ {-x^2 / 2}")
# ?plotmath可以见到更多使用数学表达式的例子

添加线条注释

1
2
3
4
# linetype="dashed", size=1
+ geom_hline(yintercept=60) # 横线
+ geom_vline(xintercept=14) # 竖线
+ geom_abline(intercept=37.4, slope=1.75) # 斜线;intercept截距;slope斜率
1
2
3
4
5
6
7
+geom_vline() +geom_hline()
# linetype线的类型(=2 虚线)
# 指定X轴坐标的竖线
+geom_vline(xintercept=1000,color="black", linetype=2)
# 制定Y轴坐标的横线
+ geom_hline(yintercept=10,color="black", linetype=2)
1
2
3
library(plyr)
hw_means <- ddply(heightweight, "sex", summarise, heightIn=mean(heightIn))
p + geom_hline(aes(yintercept=heightIn, colour=sex), data=hw_means,linetype="dashed", size=1)

image

添加分割标记

1
2
p <- ggplot(subset(climate, Source=="Berkeley"), aes(x=Year, y=Anomaly10y)) +geom_line()
p+ annotate("segment", x=1950, xend=1980, y=-.25, yend=-.25)

image

长方形阴影

1
2
p <- ggplot(subset(climate, Source=="Berkeley"), aes(x=Year, y=Anomaly10y)) +geom_line()
p + annotate("rect", xmin=1950, xmax=1980, ymin=-1, ymax=1, alpha=.1,fill="blue")

image

添加误差线

1
2
3
4
5
6
7
# geom_line geom_point同时用,画出线和点
# geom_errorbar,设置ymin和ymax
ce <- subset(cabbage_exp, Cultivar == "c39")
ggplot(ce, aes(x=Date, y=Weight)) +
geom_line(aes(group=1)) +
geom_point(size=4) +
geom_errorbar(aes(ymin=Weight-se, ymax=Weight+se), width=.2)

image

每个小平面增加注释,多小图

1
2
3
4
p <- ggplot(mpg, aes(x=displ, y=hwy)) + geom_point() + facet_grid(. ~ drv)
#构造注释数据框
f_labels <- data.frame(drv = c("4", "f", "r"), label = c("4wd", "Front", "Rear"))
p + geom_text(x=6, y=40, aes(label=label), data=f_labels)

image

加x、y坐标名字;标题

1
2
3
4
5
6
7
8
p+xlab("X轴名")+ylab("Y轴名")+ ggtitle("标题")
p+labs(x="Xname",y="Yname",title="title")
# 坐标格式: 数值百分号、美元符号、带逗号
p <- ggplot(df, aes(x, y)) + geom_point()
p + scale_y_continuous(labels = scales::percent)
p + scale_y_continuous(labels = scales::dollar)
p + scale_x_continuous(labels = scales::comma)

x、y坐标反转

1
p+coord_flip()

自定义顺序

1
p+scale_x_discrete(limits=c("BRM","GIR","LQC","NEL","LXC","NYC","QCC","YBC","YNC","FLV","HOL","JBC","JER","LIM","RAN"))

自定义颜色

1
2
3
# set colors; 26
mycolor=c("#A6CEE3", "#1F78B4", "#B2DF8A", "#33A02C", "#FB9A99", "#E31A1C", "#FDBF6F", "#FF7F00", "#CAB2D6", "#6A3D9A", "#FFFF99", "#B15928", "#A6CEE3", "#1F78B4", "#B2DF8A", "#33A02C", "#FB9A99", "#E31A1C", "#FDBF6F", "#FF7F00", "#CAB2D6", "#6A3D9A", "#FFFF99", "#B15928", "#A6CEE3", "#1F78B4")
p+scale_color_manual(values=mycolor) # 注意颜色数量要一致,26种颜色对应26个分类

颜色组合

1
2
# 4种
mycolor4=c("#E31A1C","#1F78B4","#6A3D9A","#FF7F00") # 红蓝紫橙

image

1
2
# 8种
mycolor8=c("#E41A1C","#377EB8","#4DAF4A","#984EA3","#FF7F00","#A65628","#F781BF","#000000") # 红蓝绿紫橙褐粉黑

image

1
2
3
# 10种
color=c("#E41A1C","#377EB8","#4DAF4A","#984EA3","#FF7F00","#A65628","#F781BF","#B2DF8A","#E7298A","#4EB3D3")
image(1:10,1,as.matrix(1:10),col=color,xlab=" ",ylab="",xaxt="n",yaxt="n",bty="n")

image

1
2
# 12种
mycolor12=c("#A6CEE3","#1F78B4","#B2DF8A","#33A02C","#FB9A99","#E31A1C","#FDBF6F","#FF7F00","#CAB2D6","#6A3D9A","#FFFF99","#B15928")

image

好看的颜色ColorBrewer

1
2
3
4
5
6
7
8
9
10
11
# ColorBrewer
p + geom_boxplot(aes(fill=factor(cyl)))+ scale_fill_brewer(palette='Paired')
# 对于一个单连续变量仅突出显示一端的色板:
Blues BuGn BuPu GnBu Greens Greys Oranges OrRd PuBu PuBuGn PuRd Purples RdPu Reds YlGn YlGnBu YlOrBr YlOrRd
# 对于一个单连续变量突出显示两端的色板
BrBG PiYG PRGn PuOr RdBu RdGy RdYlBu RdYlGn Spectral
# 离散变量的色板
Accent Dark2 Paired Pastel1 Pastel2 Set1 Set2 Set3
连续变量用Brewer色板需要调用 scale_*_distiller
scale_*_brewer
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
library(RColorBrewer)
# color names : https://github.com/lh3/gnuplot-colorbrewer
# 查看所有颜色
display.brewer.all()
# 查看颜色说明
brewer.pal.info
# 测试颜色 test colors
mycolor =brewer.pal(n=8, 'Paired')
image(1:8,1,as.matrix(1:8),col=mycolor,xlab=" ",ylab="",xaxt="n",yaxt="n",bty="n")
# 取其中几个
subcolor=mycolor[1]
subcolor=c(mycolor[4],mycolor[1:3],mycolor[7:5])
# ggplot manual colors
p+scale_color_manual(values=mycolor)
p+scale_fill_manual(values=mycolor)
p+scale_color_manual(values=c("BRM"="#E31A1C","GIR"="#E31A1C","NEL"="#E31A1C"))
p+scale_color_manual(values=c("#E31A1C","#1F78B4","#6A3D9A","#FF7F00"))#四个颜色 红蓝紫橙
# 直接用
p+scale_color_brewer(palette='Paired')

image

image

image

image

theme主题设置,X轴、Y轴操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# 常用主题
# 去掉多余的白色背景、框线,坐标文字黑色、图例右侧
+theme_bw()
+theme(plot.background=element_blank(),panel.background=element_blank(),legend.background=element_blank(),panel.grid =element_blank(),axis.title =element_text(color="black"),axis.text=element_text(color="black"),legend.position='right')
# 白色主题模板; theme_bw()放前面
p+theme_bw()+theme()
p+ theme(
# 去掉网格线,分别去掉主要、次要网格线
panel.grid =element_blank(),
panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
# 去掉外边框
panel.border=element_blank(),
# 加外边框
panel.border = element_rect(fill=NA, color="gray20", size=0.5, linetype="solid"),
# 再加上X轴、Y轴,无刻度、无标签
axis.line = element_line(colour = "black",size = 0.2),
# 去掉图例title
legend.title = element_blank()
# 去掉图例
legend.position = "none",
# X轴标签:大小、字体、颜色、加粗、位置、角度 <http://docs.ggplot2.org/current/element_text.html>,字体需要单独加载字体family=“”,其他element_text()规则一样
## face取值:plain普通,bold加粗,italic斜体,bold.italic斜体加粗
axis.title.x =element_text(size=12,color="black",face="bold",vjust = 0.5, hjust = 0.5, angle = 45),
# X轴刻度标签名
axis.text = element_blank(), # 去掉所有刻度标签
axis.text.x=element_blank(), # 去掉x刻度标签
axis.text.y=element_blank(), # 去掉y刻度标签
# Y轴
axis.title.y=element_text(size=8),
axis.text.y = element_text(size=8),
# X轴刻度线,不显示
axis.ticks = element_blank(), # 去掉所有刻度
axis.ticks.x=element_blank(), # 去掉x刻度
axis.ticks.y=element_blank(), # 去掉y刻度
)

默认主题

  • theme_classic,有x,y轴和没有网格线 #(这个不错)
  • theme_bw,高对比度主题
  • theme_gray,默认
  • theme_light
  • theme_linedraw,网格线变成黑实线
  • theme_minimal,没有背景内容
  • 更多 library(ggthemes)

theme关键词表

参数 设置内容 继承自
line 所有线属性
rect 所有矩形区域属性
text 所有文本相关属性
title 所有标题属性
axis.title 坐标轴标题 text
axis.title.x x轴属性 axis.title
axis.title.y y轴属性 axis.title
axis.text 坐标轴刻度标签属性 text
axis.text.x 属性和继承和前面类似,不再重复
axis.text.y
axis.ticks 坐标轴刻度线 line
axis.ticks.x
axis.ticks.y
axis.ticks.length 刻度线长度
axis.ticks.margin 刻度线和刻度标签之间的间距
axis.line 坐标轴线 line
axis.line.x
axis.line.y
legend.background 图例背景 rect
legend.margin 图例边界
legend.key 图例符号
legend.key.size 图例符号大小
legend.key.height 图例符号高度
legend.key.width 图例符号宽度
legend.text 图例文字标签
legend.text.align 图例文字标签对齐方式 0为左齐,1为右齐
legend.title 图例标题 text
legend.title.align 图例标题对齐方式
legend.position 图例位置 left, right, bottom, top, 两数字向量
legend.direction 图例排列方向 “horizontal” or “vertical”
legend.justification 居中方式 center或两数字向量
legend.box 多图例的排列方式 “horizontal” or “vertical”
legend.box.just 多图例居中方式
panel.background 绘图区背景 rect
panel.border 绘图区边框 rect
panel.margin 分面绘图区之间的边距
panel.grid 绘图区网格线 line
panel.grid.major 主网格线
panel.grid.minor 次网格线
panel.grid.major.x
panel.grid.major.y
panel.grid.minor.x
panel.grid.minor.y
plot.background 整个图形的背景
plot.title 图形标题
plot.margin 图形边距 top, right, bottom, left
strip.background 分面标签背景 rect
strip.text 分面标签文本 text
strip.text.x
strip.text.y

加载字体

1
2
3
4
5
6
7
8
9
# require("sysfonts")
# require("showtext")
# 查看字体列表,可以使用系统已安装的所有字体
font.files()
# 更改字体 family参数
ggplot(data.frame(x = rnorm(100))) +
geom_histogram(aes(x), fill = 'purple', alpha = 0.6) +
labs(x = 'X 取值', y = '频数 Count') +
theme(text = element_text(family = 'Arial Black'))

x、y轴,数据范围

1
2
3
4
5
6
7
# 方法一
p+ scale_y_continuous(limits=c(0,1.8))
# scale_x_continuous(limits=c(0,10))
# 方法二
p+xlim(0,10) # 不一定都适用
p + xlim(min(dt$A, 0)*1.2, max(dt$A)*1.2) ## 一般使用倍数来限定大小,注意定义最小值的方式

坐标轴刻度间隔大小

1
p + scale_x_continuous(breaks=seq(0, 10, 5)) # X轴,每隔5个单位

检验

wilcox.test()

Wilcoxon rank sum test / Wilcoxon秩和检验
Mann–Whitney U test(曼-惠特尼U检验) / Mann–Whitney–Wilcoxon (MWW)

在R中,wilcox.test()函数可以用来做Wilcoxon秩和检验,也可以用于做 Mann-Whitney U检验。

  • 当参数为单个样本/或者是两个样本相减/或者是两个参数,paired=T时,是Wilcoxon秩和检验。
  • 两个参数,paired = FALSE(独立样本)时,就是 Mann-Whitney U检验

If only x is given, or if both x and y are given and paired is TRUE, a Wilcoxon signed rank test of the null that the distribution of x (in the one sample case) or of x - y (in the paired two sample case) is symmetric about mu is performed.

Otherwise, if both x and y are given and paired is FALSE, a Wilcoxon rank sum test (equivalent to the Mann-Whitney test: see the Note) is carried out. In this case, the null hypothesis is that the distributions of x and y differ by a location shift of mu and the alternative is that they differ by some other location shift (and the one-sided alternative “greater” is that x is shifted to the right of y).

配对Wilcoxon符号秩检验

例8-1

对12份血清分别用原方法(old, 检测时间20分钟)和新方法(new, 检测时间10分钟)测谷-丙转氨酶。问两法所得结果有无差别?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# 源代码:
old <- c(60,142,195,80,242,220,190,25,198,38,236,95)
new <- c(76,152,243,82,240,220,205,38,243,44,190,100)
wilcox.test(old,new,paired = TRUE)
# 运行结果
> wilcox.test(old,new,paired = TRUE) #配对wilcoxon检验
Wilcoxon signed rank test with continuity correction
data: old and new
V = 11.5, p-value = 0.06175
alternative hypothesis: true location shift is not equal to 0
Warning messages:
1: In wilcox.test.default(old, new, paired = TRUE) :
cannot compute exact p-value with ties
2: In wilcox.test.default(old, new, paired = TRUE) :
cannot compute exact p-value with zeroes
# 结果分析: p-value=0.06175,因此不能认为两法测谷丙转氨酶有差别。

独立样本Wilcoxon检验(Mann-Whitney U检验)

例8-2

已知某地正常人尿氟含量的中位数为45.30μmol/L 。今在该地某厂随机抽取12名工人,测得尿氟含量。问该厂工人的尿氟含量是否高于当地正常人的尿氟含量?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 源代码:
data82 <- c(44.21,45.30,46.39,49.47,51.05,53.16,53.26,54.37,57.16,67.37,71.05,87.37)
wilcox.test(data82-45.30)
# 运行结果:
> wilcox.test(data82,mu=45.30)
Wilcoxon signed rank test with continuity correction
data: data82
V = 65, p-value = 0.005099
alternative hypothesis: true location is not equal to 45.3
Warning message:
In wilcox.test.default(data82, mu = 45.3) :
cannot compute exact p-value with zeroes
# 结果分析:p值小于0.01,可以视为该工厂的工人尿氟含量高于正常人。

例8-3

对10例肺癌病人和12例矽肺0期工人用X光片测量肺门横径右侧距RD值(cm)。问肺癌病人的RD值是否高于矽肺0期工人的RD值?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 源代码:
lc <- c(2.78,3.23,4.2,4.87,5.12,6.21,7.18,8.05,8.56,9.6)
si <- c(3.23,3.5,4.04,4.15,4.28,4.34,4.47,4.64,4.75,4.82,4.95,5.1)
wilcox.test(lc,si)
#运行结果:
> wilcox.test(lc,si,alternative="greater",exact=F)
Wilcoxon rank sum test with continuity correction
data: lc and si
W = 86.5, p-value = 0.04318
alternative hypothesis: true location shift is greater than 0
# 在R中,wilcox.test()函数可以用来做Wilcoxon秩和检验,也可以用于做Mann-Whitney U检验。当参数为单个样本,或者是两个样本相减,或者是两个参数,paired=F时,是Wilcoxon秩和检验。当paired = FALSE(独立样本)时,就是Mann-Whitney U检验,在这个题目中,用的就是Mann-Whitney U检验,虽然结果中W=86.5与书中的T=141.5不一样,但本质上是一样的,换算如下:W1=141.5-10*(10+1)/2=86.5;W2=111.5-12*(12+1)/2=33.5

例8-4

已知39名吸烟工人和40名不吸烟工人的碳氧血红蛋白HbCO(%)含量,问吸烟工人的HbCO(%)含量是否高于不吸烟工人的HbCO(%)含量?

答:本题书中用的是wilcox检验,其实用Ridit分析更合适一些,下面分别用这两种方法进行检验:

  • Wilcox.test,或Kruskal检验

Wilcox.test(或Kruskal检验)

1
2
3
4
5
6
7
8
9
smoke <- c(1,8,16,10,4)
no.smoke <-c(2,23,11,4,0)
rank.c <- c(1:5)
group1 <- rep(rank.c,smoke)
group2 <- rep(rank.c,no.smoke)
data84 <- c(group1,group2)
group.f <-factor(c(rep(1,length(group1)),rep(2,length(group2))))
wilcox.test(data84~group.f)
kruskal.test(data84~group.f)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# 运行结果:
# wilcox.test
wilcox.test(data84~group.f)
Wilcoxon rank sum test with continuity correction
data: data84 by group.f
W = 1137, p-value = 0.0002181
alternative hypothesis: true location shift is not equal to 0
Warning message:
In wilcox.test.default(x = c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, :
cannot compute exact p-value with ties
# kruskal.test检验
> kruskal.test(data84~group.f)
Kruskal-Wallis rank sum test
data: data84 by group.f
Kruskal-Wallis chi-squared = 13.707, df = 1, p-value = 0.0002137
  • 此题用Ridit检验更合适一些
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
data84 <- matrix(c(1,8,16,10,4,2,23,11,4,0),nrow=5)
dimnames(data84) <- list(cone=c("very low","low","median","a bit high","high"),
worker=c("smoker","no-smoker"))
library(Ridit)
ridit(data84,2)
chisq.test(data84)
# 运行结果:
> library(Ridit)
> ridit(data84,2)
Ridit Analysis:
Group Label Mean Ridit
----- ----- ----------
1 smoker 0.6159
2 no-smoker 0.387
Reference: Total of all groups
chi-squared = 13.707, df = 1, p-value = 0.0002137
> chisq.test(data84)
Pearsons Chi-squared test
data: data84
X-squared = 15.0785, df = 4, p-value = 0.004541

Kurskal-Wallis检验

Kurskal-Wallis检验是Wilcoxon方法(其实是Mann-Whitney检验)用于多于两个样本的时候的升级版。当对两个样本进行比较的时候,Kurskal-Wallis检验与Mann-Whitney检验是等价的。

例8-5

用三种药物杀灭钉螺,每批用200只活钉螺,用药后清点每批钉螺的死亡数、再计算死亡率(%),问三种药物杀灭钉螺的效果有无差别?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 源代码:
drug <-rep(c("甲药","乙药","丙药"),each=5)
data <- c(32.5,35.5,40.5,46,49,16,20.5,22.5,29,36,6.5,9.0,12.5,18,24)
data85 <- data.frame(drug,data)
kruskal.test(data85$data~data85$drug)
# 运行结果:
> kruskal.test(data85$data~data85$drug)
Kruskal-Wallis rank sum test
data: data85$data by data85$drug
Kruskal-Wallis chi-squared = 9.74, df = 2, p-value = 0.007673
# 结果分析:Kruskal-Wallis的统计量是H值。

例8-6

比较小白鼠接种三种不同菌型伤寒杆菌9D、11C和DSC1后存活日数,问小白鼠接种三种不同菌型伤寒杆菌的存活日数有无差别?

1
2
3
4
5
6
7
8
9
10
11
12
13
# 源代码:
mice <- as.factor(c(rep("9D",10),rep("11C",9),rep("DSC1",11)))
data86 <- c(2,2,2,3,4,4,4,5,7,7,5,5,6,6,6,7,8,10,12,3,5,6,6,6,7,7,9,10,11,11)
data86 <- data.frame(mice,data)
kruskal.test(data86$data~data86$mice)
# 运行结果:
> kruskal.test(data86$data~data86$mice)
Kruskal-Wallis rank sum test
data: data86$data by data86$mice
Kruskal-Wallis chi-squared = 9.9405, df = 2, p-value = 0.006941

例8-7

四种疾病患者痰液内嗜酸性白细胞的检查结果见表8-11。问四种疾病患者痰液内的嗜酸性白细胞有无差别?注:这道例题与《医学统计学及SAS应用》(上海交通大学)的9.11类似

image

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# 源代码:
x1 <- c(0,2,9,6)
x2 <- c(3,5,5,2)
x3 <- c(5,7,3,2)
x4 <- c(3,5,3,0)
freq <- function(x){
count <-c()
for(i in 1:4){
count1 <-c(rep(i,x[i]))
count <- append(count,count1)
}
return(count)}
data87 <-c(freq(x1),freq(x2),freq(x3),freq(x4))
group <- c(rep(1,sum(x1)),rep(2,sum(x2)),rep(3,sum(x3)),rep(4,sum(x4)))
kruskal.test(data87~group)
# 运行结果:
> kruskal.test(data87~group)
Kruskal-Wallis rank sum test
data: data87 by group
Kruskal-Wallis chi-squared = 15.5058, df = 3, p-value = 0.001432

例8-8

对例8-6资料作三个样本间的两两比较,Nemenyi检验。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# 源代码:
# 需要安装下列包:
# install.packages("pgirmess")
# install.packages("coin")
# install.packages("multcomp")
# library(pgirmess)
# library(coin)
# library(multcomp) # 安装并加载要用到的包
mice <- as.factor(c(rep("9D",10),rep("11C",9),rep("DSC1",11)))
data <- c(2,2,2,3,4,4,4,5,7,7,5,5,6,6,6,7,8,10,12,3,5,6,6,6,7,7,9,10,11,11)
data88 <- data.frame(mice,data)
kruskal.test(data~mice,data=data88)
kruskalmc(data~mice, data=data88, probs=0.05) # 使用kruskalmc函数做两两比较,但此方法不能给出具体的值
# 下面构建函数计算具体的p值
mult <- oneway_test(data ~ mice, data = data88,
ytrafo = function(data) trafo(data, numeric_trafo = rank),
xtrafo = function(data) trafo(data, factor_trafo = function(x)
model.matrix(~x - 1) %*% t(contrMat(table(x), "Tukey"))),
teststat = "max", distribution = approximate(B = 90000))
pvalue(mult, method = "single-step") #计算具体的p值,这段代码是复制丁香园的[1]
# 运行结果:
> kruskal.test(data~mice,data=data88)
Kruskal-Wallis rank sum test
data: data by mice
Kruskal-Wallis chi-squared = 9.9405, df = 2, p-value = 0.006941
> kruskalmc(data~mice, data=data88, probs=0.05) # 使用kruskalmc函数做两两比较,但此方法不能给出具体的值
Multiple comparison test after Kruskal-Wallis
p.value: 0.05
Comparisons
obs.dif critical.dif difference
11C-9D 10.3777778 9.683378 TRUE
11C-DSC1 0.4949495 9.472590 FALSE
9D-DSC1 10.8727273 9.208410 TRUE
> pvalue(mult, method = "single-step") #计算具体的p值
9D - 11C 0.01870000
DSC1 - 11C 0.95110000
DSC1 - 9D 0.01088889

例8-9

已知8名受试对象在相同实验条件下分别接受4种不同频率声音的刺激,他们的反应率(%)资料。问4种频率声音刺激的反应率是否有差别?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 源代码:
freA <- c(8.40,11.60,9.40,9.80,8.30,8.60,8.90,7.80)
freB <- c(9.60,12.70,9.10,8.70,8.00,9.80,9.00,8.20)
freC <- c(9.80,11.80,10.40,9.90,8.60,9.60,10.60,8.50)
freD <- c(11.70,12.00,9.80,12.00,8.60,10.60,11.40,10.80)
matrix89 <- matrix(c(freA,freB,freC,freD),nrow=8,dimnames=list(no=1:8,c("A","B","C","D")))
friedman.test(matrix89)
运行结果:
> friedman.test(matrix89)
Friedman rank sum test
data: matrix89
Friedman chi-squared = 15.1519, df = 3, p-value = 0.001691

手动计算PCA

prcomp

输入格式:

1
2
3
4
5
6
7
# in.txt
B1 B2 B3 B4 B5 B6 C1 C2 C3 C4 C5
2 2 1 1 2 2 1 1 1 1 1
1 1 1 1 1 0 0 0 1 0 0
3 3 3 3 2 3 2 2 2 2 2
......
......

1
2
3
4
5
6
7
8
9
10
11
12
13
14
y=read.table("in.txt",header=T)
y=as.matrix(y)
# 转置矩阵
y=t(y)
p=prcomp(y,scale=T)
p2=summary(p)
# 显示PC1-PC4
p2[5]$x[,1:4]
# 查看PC解释度 Proportion of Variance
p2[6]$importance[,1:4]
# 存到文件里
write.table(p3[5]$x[,1:4],file="pca.out",sep=" ",quote=F,row.names=F,col.names=F)
write.table(p3[6]$importance[,1:6],file="pca.out.log",sep=" ",quote=F,row.names=F,col.names=F)