好的编码规范就像是用对了标点符号。最重要的作用就是,提供一致性的编写规范,使代码更易读,并且更易写,因为固定的规范使你不需要纠结于选择而提高编程效率。
这篇R代码风格是,Hadley Wickham总结的tidyverse style guide,原规范是一本书,包含两部分,第一部分是Analyses脚本的规范,第二部分是package开发的编码规范。本文是第一部分Analyses 的内容。向大牛学习编码规范,自然没有错的。
本文中还引用了一个standford关于Rstyle 的描述网页的部分内容,内容不和Hadley Wickham的规范冲突,是补充覆盖到Hadley Wickham没有提到的细节部分。在本文中,引用该网页的内容,我用(standford)进行了标注。
对于进行编码规范的R包,有两个R包是支持本编码规范的,分别是
# Good
fit_models.R
# Bad
fit models.R
fit.r
# 顺序
00_download.R
01_explore.R
...
09_model.R
10_visualize.R
不同的项目难以有一个普适的文件组织规律。我认为最好的经验法则是,给每一个文件起一个简洁的名字,同时能让人想起文件的内容,那么你就对一个课题的脚本文件组织好了文件名。但要做到这一点是要花功夫的。
以下是来自standford_R_style 的内容:
如果每个人都使用通用顺序,我们将能够更快、更容易地阅读和理解彼此的脚本。
General Layout and Ordering
source()
and library()
statementsprint
, plot
). Unit tests should go in a separate file named originalfilename_unittest.R
.# Load data ---------------------------------------------
# Plot data ---------------------------------------------
“计算机科学中,最难的两件事就是,缓存失效和对象命名。” —— Phil Karlton
# Good
day_one
# Bad
first_day_of_the_month
T <- FALSE
c <- 10
# Good
x[, 1]
# Bad
x[,1]
x[ ,1]
# Good
mean(x, na.rm = TRUE)
# Bad
mean (x, na.rm = TRUE)
mean( x, na.rm = TRUE)
# Good
if (debug) {
show(x)
}
# Bad
if(debug){
show(x)
}
# Good
function(x) {}
# Bad
function (x) {}
function(x){}
# Good
max_by <- function(data, var, by) {
data %>%
group_by({{ by }}) %>%
summarise(maximum = max({{ var }}), na.rm = TRUE)
}
# Bad
max_by <- function(data, var, by) {
data %>%
group_by({{by}}) %>%
summarise(maximum = max({{var}}), na.rm = TRUE)
}
# Good
height <- (feet * 12) + inches
# Bad
height <- (feet*12)+inches
# Good
sqrt(x^2 + y^2)
df$z
x <- 1:10
# Bad
sqrt(x ^ 2 + y ^ 2)
df $ z
x <- 1 : 10
# Good
package?stats
?mean
# Bad
package ? stats
? mean
# Good
list(
total = a +b +c,
mean = total / n
)
# Also fine
list(
total = a +b +c,
mean = total / n
)
函数参数,一般有两类,一类是提供用于计算的数据,一类是控制如何计算的细节参数。
# Good
mean(1:10, na.rm = TRUE)
# Bad
mean(1:10, , FALSE)
mean(, TRUE, x = c(1:10, NA))
# Good
x <- complicated_function()
if (nzchar(x) < 1) {
# do something
}
# Bad
if (nzchar(x <- complicated_function()) < 1) {
# do something
}
output <- capture.output(x <- f())
# Good
if (y <0 && debug) {
message("y is negative")
}
if (y ==0) {
if (x >0) {
log(x)
} else {
message("x is negative or zero")
}
} else {
y ^x
}
test_that("call returns an ordered factor", {
expect_s3_class(call1(x, y), c("factor", "ordered"))
})
tryCatch(
{
x <- scan()
cat("Total: ", sum(x), "\n", sep = "")
},
interrupt = function(e) {
message("Aborted by user")
}
)
# Bad
if (y <0 && debug) {
message("y is negative")
}
if (y ==0)
{
if (x >0) {
log(x)
} else {
message("x is negative or zero")
}
} else {
y ^x}
# Good
y <- 10
x <- if (y < 20) "Too low" else "Too high"
# Good
if (y <0) {
stop("Y is negative")
}
find_abs <- function(x) {
if (x > 0) {
return(x)
}
x * -1
}
# Bad
if (y <0) stop("Y is negative")
find_abs <- function(x) {
if (x > 0) return(x)
x * -1
}
# Good
if (length(x) > 0) {
# do something
}
# Bad
if (length(x)) {
# do something
}
# Good
switch(x
a = ,
b = 1,
c = 2,
stop("Unknown `x`", call. = FALSE)
)
# Bad
switch(x, a = , b = 1, c = 2)
switch(y, 1, 2, 3)
# Good
do_something_very_complicated(
something = "that",
requires = many,
arguments = "some of which may be long"
)
# Bad
do_something_very_complicated("that", requires, many, arguments,
"some of which may be long"
)
# Good
paste0(
"Requirement: ", requires, "\n",
"Result: ", result, "\n"
)
# Bad
paste0(
"Requirement: ", requires,
"\n", "Result: ",
result, "\n"
)
# Good
"Text"
‘Text with "quotes"‘
‘<a href="http://style.tidyverse.org">A link</a>‘
# Bad
‘Text‘
‘Text with "double" and \‘single\‘ quotes‘
注释行,以一个# 开头,# 后有一个空格。
短注释,可以放在代码后面,前面有两个空格#,然后是一个空格 (standford)。
数据分析的代码中,注释内容应该是记录重要发现,和分析决定。而不是解释代码在做什么,如果要添加注释来解释代码在做什么,考虑重写更简单的代码。
如果发现注释的内容,多余代码的内容,换R Markdown 或者jupyter notebook。
# Good
add_row()
permute()
# Bad
row_adder()
permutation()
# Good
long_function_name <- function(a = "a long argument",
b = "another argument",
c = "another long argument") {
# As usual code is indented by two spaces.
}
# Good
find_abs <- function(x) {
if (x > 0) {
return(x)
}
x * -1
}
add_two <- function(x, y) {
x + y
}
# Bad
add_two <- function(x, y) {
return(x + y)
}
# Good
print.url <- function(x, ...) {
cat("Url: ", build_url(x), "\n", sep = "")
invisible(x)
}
函数应该在函数定义行下面包含注释部分。注释应包括
# 示例函数
CalculateSampleCovariance <- function(x, y, verbose = TRUE) {
# Computes the sample covariance between two vectors.
#
# Args:
# x: One of two vectors whose sample covariance is to be calculated.
# y: The other vector. x and y must have the same length, greater than one,
# with no missing values.
# verbose: If TRUE, prints sample covariance; if not, not. Default is TRUE.
#
# Returns:
# The sample covariance between x and y.
n <- length(x)
# Error handling
if (n <= 1 || n != length(y)) {
stop("Arguments x and y have invalid lengths: ",
length(x), " and ", length(y), ".")
}
if (TRUE %in% is.na(x) || TRUE %in% is.na(y)) {
stop(" Arguments x and y must not have missing values.")
}
covariance <- var(x, y)
if (verbose)
cat("Covariance = ", round(covariance, 4), ".\n", sep = "")
return(covariance)
}
管道符 %>% 是用来强调一系列的操作,而不是强调操作要作用到的对象。
在以下的情况下,避免使用管道:
# Good
iris %>%
group_by(Species) %>%
summarise(
Sepal.Length = mean(Sepal.Length),
Sepal.Width = mean(Sepal.Width),
Species = n_distinct(Species)
)
iris_long <-
iris %>%
gather(measure, value, -Species) %>%
arrange(-value)
iris_long <- iris %>%
gather(measure, value, -Species) %>%
arrange(-value)
iris %>%
gather(measure, value, -Species) %>%
arrange(-value) ->
iris_long
对于ggplot中 + 号的使用规范,和%>% 在管道中的使用规范是十分相似的。
# Good
iris %>%
filter(Species == "setosa") %>%
ggplot(aes(x = Sepal.Width, y = Sepal.Length)) +
geom_point()
# Bad
iris %>%
filter(Species == "setosa") %>%
ggplot(aes(x = Sepal.Width, y = Sepal.Length)) +
geom_point()
原文:https://www.cnblogs.com/songbiao/p/13042372.html