# R – Sorting a data frame by the contents of a column

February 12, 2010
By

(This article was first published on Developmentality ? R, and kindly contributed to R-bloggers)

Let’s examine how to sort the contents of a data frame by the value of a column

```> numPeople = 10
> sex=sample(c("male","female"),numPeople,replace=T)
> age = sample(14:102, numPeople, replace=T)
> income = sample(20:150, numPeople, replace=T)
> minor = age<18
```

This last statement might look surprising if you’re used to Java or a traditional programming language. Rather than becoming a single boolean/truth value, minor actually becomes a vector of truth values, one per row in the age column.? It’s equivalent to the much more verbose code in Java:

```int[] age= ...;
for (int i = 0; i < income.length; i++) {
?? minor[i] = age[i] < 18;
}
```

Just as expected, the value of minor is a vector:

```> mode(minor)
[1] "logical"
> minor
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE? TRUE FALSE FALSE
Next we create a data frame, which groups together our various vectors into the columns of a data structure:
> population = data.frame(sex=sex, age=age, income=income, minor=minor)
> population
sex age income minor
1??? male? 68??? 150 FALSE
2??? male? 48???? 21 FALSE
3? female? 68???? 58 FALSE
4? female? 27??? 124 FALSE
5? female? 84??? 103 FALSE
6??? male? 92??? 112 FALSE
7??? male? 35???? 65 FALSE
8? female? 15??? 117? TRUE
9??? male? 89???? 95 FALSE
10?? male? 26???? 54 FALSE

The arguments (sex=sex, age=age, income=income, minor=minor) assign the same names to the columns as I originally named the vectors; I could just as easily call them anything.? For instance,
> data.frame(a=sex, b=age, c=income, minor=minor)
a? b?? c minor
1??? male 68 150 FALSE
2??? male 48? 21 FALSE
3? female 68? 58 FALSE
4? female 27 124 FALSE
5? female 84 103 FALSE
6??? male 92 112 FALSE
7??? male 35? 65 FALSE
8? female 15 117? TRUE
9??? male 89? 95 FALSE
10?? male 26? 54 FALSE

But I prefer the more descriptive labels I gave previously.
> population
sex   age income minor
1??? male? 68??? 150 FALSE
2??? male? 48???? 21 FALSE
3? female? 68???? 58 FALSE
4? female? 27??? 124 FALSE
5? female? 84??? 103 FALSE
6??? male? 92??? 112 FALSE
7??? male? 35???? 65 FALSE
8? female? 15??? 117? TRUE
9??? male? 89???? 95 FALSE
10?? male? 26???? 54 FALSE

Now let's say we want to order by the age of the people.  To do that is a one liner:
> population[order(population\$age),]
sex age income minor
8? female? 15??? 117? TRUE
10?? male? 26???? 54 FALSE
4? female? 27??? 124 FALSE
7??? male? 35???? 65 FALSE
2??? male? 48???? 21 FALSE
1??? male? 68??? 150 FALSE
3? female? 68???? 58 FALSE
5? female? 84??? 103 FALSE
9??? male? 89???? 95 FALSE
6??? male? 92??? 112 FALSE

This is not magic; you can select arbitrary rows from any data frame? with the same syntax:
> population
sex age income minor
1?? male? 68??? 150 FALSE
2?? male? 48???? 21 FALSE
3 female? 68???? 58 FALSE

The order function merely returns the indices of the rows in sorted order.
> order(population\$age)
[1]? 8 10? 4? 7? 2? 1? 3? 5? 9? 6

Note the \$ syntax; you select columns of a data frame by using a dollar sign and the name of the column.  You can retrieve the names of the columns of a data frame with the names function.
> names(population)
[1] "sex"??? "age"??? "income" "minor"

> population\$income
[1] 150? 21? 58 124 103 112? 65 117? 95? 54
> income
[1] 150? 21? 58 124 103 112? 65 117? 95? 54

As you can see, they are exactly the same.
So what we're really doing with the command
population[order(population\$age),]

is
population

Note the trailing comma; what this means is to take all the columns.  If we only wanted certain columns, we could specify after this comma.
> population[order(population\$age),c(1,2)]
sex age
8? female? 15
10?? male? 26
4? female? 27
7??? male? 35
2??? male? 48
1??? male? 68
3? female? 68
5? female? 84
9??? male? 89
6??? male? 92

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' };

(function(d, t) {
var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;
var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r);
}(document, 'script'));

Related
ShareTweet

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

```