香港赛马会彩券管理局

R – Sorting a data frame by the contents of a column

February 12, 2010
By

(This article was first published on Developmentality ? R, and kindly contributed to R-bloggers)

Let’s examine how to sort the contents of a data frame by the value of a column

> numPeople = 10
> sex=sample(c("male","female"),numPeople,replace=T)
> age = sample(14:102, numPeople, replace=T)
> income = sample(20:150, numPeople, replace=T)
> minor = age<18

This last statement might look surprising if you’re used to Java or a traditional programming language. Rather than becoming a single boolean/truth value, minor actually becomes a vector of truth values, one per row in the age column.? It’s equivalent to the much more verbose code in Java:

int[] age= ...;
for (int i = 0; i < income.length; i++) {
?? minor[i] = age[i] < 18;
}

Just as expected, the value of minor is a vector:

> mode(minor)
[1] "logical"
> minor
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE? TRUE FALSE FALSE

Next we create a data frame, which groups together our various vectors into the columns of a data structure:

> population = data.frame(sex=sex, age=age, income=income, minor=minor)
> population
 sex age income minor
1??? male? 68??? 150 FALSE
2??? male? 48???? 21 FALSE
3? female? 68???? 58 FALSE
4? female? 27??? 124 FALSE
5? female? 84??? 103 FALSE
6??? male? 92??? 112 FALSE
7??? male? 35???? 65 FALSE
8? female? 15??? 117? TRUE
9??? male? 89???? 95 FALSE
10?? male? 26???? 54 FALSE

The arguments (sex=sex, age=age, income=income, minor=minor) assign the same names to the columns as I originally named the vectors; I could just as easily call them anything.? For instance,

> data.frame(a=sex, b=age, c=income, minor=minor)
 a? b?? c minor
1??? male 68 150 FALSE
2??? male 48? 21 FALSE
3? female 68? 58 FALSE
4? female 27 124 FALSE
5? female 84 103 FALSE
6??? male 92 112 FALSE
7??? male 35? 65 FALSE
8? female 15 117? TRUE
9??? male 89? 95 FALSE
10?? male 26? 54 FALSE

But I prefer the more descriptive labels I gave previously.

> population
     sex   age income minor
1??? male? 68??? 150 FALSE
2??? male? 48???? 21 FALSE
3? female? 68???? 58 FALSE
4? female? 27??? 124 FALSE
5? female? 84??? 103 FALSE
6??? male? 92??? 112 FALSE
7??? male? 35???? 65 FALSE
8? female? 15??? 117? TRUE
9??? male? 89???? 95 FALSE
10?? male? 26???? 54 FALSE

Now let's say we want to order by the age of the people. To do that is a one liner:

> population[order(population$age),]
 sex age income minor
8? female? 15??? 117? TRUE
10?? male? 26???? 54 FALSE
4? female? 27??? 124 FALSE
7??? male? 35???? 65 FALSE
2??? male? 48???? 21 FALSE
1??? male? 68??? 150 FALSE
3? female? 68???? 58 FALSE
5? female? 84??? 103 FALSE
9??? male? 89???? 95 FALSE
6??? male? 92??? 112 FALSE

This is not magic; you can select arbitrary rows from any data frame? with the same syntax:

> population
 sex age income minor
1?? male? 68??? 150 FALSE
2?? male? 48???? 21 FALSE
3 female? 68???? 58 FALSE

The order function merely returns the indices of the rows in sorted order.

> order(population$age)
 [1]? 8 10? 4? 7? 2? 1? 3? 5? 9? 6

Note the $ syntax; you select columns of a data frame by using a dollar sign and the name of the column. You can retrieve the names of the columns of a data frame with the names function.

> names(population)
[1] "sex"??? "age"??? "income" "minor" 

> population$income
 [1] 150? 21? 58 124 103 112? 65 117? 95? 54
> income
 [1] 150? 21? 58 124 103 112? 65 117? 95? 54

As you can see, they are exactly the same.

So what we're really doing with the command

population[order(population$age),]

is

population

Note the trailing comma; what this means is to take all the columns. If we only wanted certain columns, we could specify after this comma.

> population[order(population$age),c(1,2)]
 sex age
8? female? 15
10?? male? 26
4? female? 27
7??? male? 35
2??? male? 48
1??? male? 68
3? female? 68
5? female? 84
9??? male? 89
6??? male? 92

To leave a comment for the author, please follow the link and comment on their blog: Developmentality ? R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)
香港赛马会彩券管理局
2019年大乐透选号 五分彩走势图 黑龙江快乐十分钟开奖结果 年香港赛马会 什么是江苏11选5规则 新疆体彩11选5爱彩乐 七乐彩开奖号结果 极速快乐十分是哪里的 3d组三所有号码 五味斋心水论坛 湖北30选5开走势图 围棋历史 排球比赛前四局中暂停次数限制 创富心水论坛三中三 北京快中彩玩法说明