Data manipulation

This is based on a book: Data Manipulation with R (Phil Spector).

page 136:
Datasets can be wide or long. When there are multiple occurences of values for a single observation:

  • a dataset is said to be long if each occurence is a separate row in the data frame (most IDR data, EAV design).
  • a  dataset is said to be wide if all of the occurences of values for a given observation are in the same row

R’s reshape function is very useful (http://stat.ethz.ch/R-manual/R-patched/library/stats/html/reshape.html )

Also a dataset can be “melted” and cast to a desired shape (using the reshape package; http://cran.r-project.org/web/packages/reshape/reshape.pdf

library(reshape)
melted_data= melt (data)
desired_shape_data = cast(PARAMS,data=melted_data)

Very useful.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: