Tables and data manipulation
Applications of histograms and densities
Tables are a form of data visualization.
Within “base R”, personal favorites are table()
and aggregate()
.
tab
or tmp
.We can apply any relevant function computational by
one group.
Or by
more groups.
Explore relevant functions, possibly starting from ?mean
to look for related calculations.
Alternate “model” syntax is possible!
This is especially nice because the “group names” are preserved.
We can use a related command merge()
to combine multiple aggregated values and names()
to give it some organization.
The names Group.1
and Group.2
are automatically assigned.
Using the iris
data, we can quickly create comprehensive summaries.
Each row in the data contains a species label and 4 numerical measurements. We can simultaneously compute multiple variable means within species.
We could have listed by column names, but here the numbers are convenient.
Data is too often recorded in “wide” form, but “long” form is preferred.
The reshape()
command can be used for converting between “shapes”.
This can get slow and messy, so we will save this for a later date.
Dates, entered in a variety of formats, make analyses challenging.
Use substr()
to extract parts of character-formatted dates, and attempt to aggregate observations by year (or other time increment(s)).
substr()
above takes a column, and row-by-row extracts characters 1 through 4 (a 4-digit year).Use julian()
to compute a “julian” date (day as number).
Ask questions along the way!
If you have,
Consider annotating with,
text()