[R] shoudl I use apply, sapply, etc instead of a "for loop"?

From: Thomas Pujol <thomas.pujol_at_yahoo.com>
Date: Wed, 20 Jun 2007 11:58:08 -0700 (PDT)

I have been trying to learn the various "apply" functions but am still learning their appropriate use. I appreciate any help the R community can offer me. Sorry for the length of this post.


I have data on my hard drive organized in the following manner:

The data pertains to many different "samples" of data. (e.g. sample 001, sample, 002, sample 003, etc.)

Each "sample" contains many different "data frames" for a large number of different data-items. (e.g. sat score, median income of zip-code, gender, GPA, etc)

The data frames and files are each named with the data-item name as the "prefix" of the name and the "sample number" as the suffix of the name. e.g. sat.001, income.001, sat.002, income.002

Each data frame has approximately 5,000 rows, 1 for each "person".

Note: The files are somehat large, and most of my analysis will be completed within each "sample" . (Thus, I think that I should probably keep the files stored as separate files, and not combine them into a larger list or data frames. I also do not think I want to load all the files for multiple samples at once, as this mayy take up too much memory.) Also, I have similar simplified description of the files; many contain multiple columns of data.


I have written a "for" loop that does the following:

  1. For each "sample period" I load two files.
  2. I perform a function on the data contain din these two files.
  3. I take the results and save them as a new file. I proceed to the next sample.

Is there a "better" (i.e. more elegant and/or efficient) way to do this, perhaps with one of the "apply" functions? (e.g. apply, sapply, lapply, tapply?)

#e.g. my simplified code

#this creates example data:



filenames=c('sat.001', 'sat.002', 'income.001', 'income.002') sapply(filenames,function(x) { save( list=x , file = paste(x ,'.r', sep ='') ) }) rm(sat.001,sat.002,income.001,income.002,filenames) ls() #
#my for loop

divide = function(x,y) {x/y}
#creates a custom function

#inputs to my loop:


for (i in 1:length(samplenames) ) {

x.name.suf = paste(x.name,samplenames[i],sep='.')
#name of x file on hrd drive

y.name.suf = paste(y.name,samplenames[i],sep='.')
#name of y file on hrd drive

x=get(load(file = paste(x.name.suf ,'r', sep ='.') , envir = .GlobalEnv) )
#loads and gets the x file

y=get(load(file = paste(y.name.suf ,'r', sep ='.') , envir = .GlobalEnv) )
#loads and gets the y file

#applies custom function specified in arguments above
# to data contained in x and y files

save( list='temp' , file = paste(fun,x.name ,y.name,samplenames[i],sep='.') )
#save the results in files with name that specifies
#name of function, name of x, name of y, and sample number
#files will be used for later analysis

rm(list=paste(x.name.suf , sep ='.'))
rm(list=paste(y.name.suf , sep ='.'))


rm(divide,samplenames,x.name,y.name,fun,i) ls()  

Bored stiff? Loosen up...

        [[alternative HTML version deleted]]

R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 20 Jun 2007 - 19:04:37 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 20 Jun 2007 - 19:32:09 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.