From: Stephen Tucker <brown_emu_at_yahoo.com>

Date: Sat 31 Mar 2007 - 01:41:39 GMT

# create a matrix of data, means, and standard deviations listMatrix <- cbind(Data=doSplits(df),

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat Mar 31 11:46:44 2007

Date: Sat 31 Mar 2007 - 01:41:39 GMT

Hi Sergey,

I believe the code below should get you close to want you want.

For dates, I usually store them as "POSIXct" classes in data frames, but according to Gabor Grothendieck and Thomas Petzoldt's R Help Desk article <http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf>, I should probably be using "chron" date and times...

Nonetheless, POSIXct casses are what I know so I can show you that to get the month out of your column (replace "8.29.97" with your variable), you can do the following:

month = format(strptime("8.29.97",format="%m.%d.%y"),format="%m")

Or,

month = as.data.frame(strsplit("8.29.97","\\."))[1,]

In any case, here is a code, in which I follow a series of function application and definitions (which effectively includes successive application of split() and lapply().

Best regards,

ST

# define data (I just made this up)

df <-

data.frame(month=as.character(rep(1:3,each=30)),fac=factor(rep(1:2,each=15)), data1=round(runif(90),2), data2=round(runif(90),2))

# define functions to split the data and another
# to get statistics

doSplits <- function(df) {

unlist(lapply(split(df,df$month),function(x)
split(x,x$fac)),recursive=FALSE)

}

getStats <- function(x,f) {

return(as.data.frame(lapply(x[unlist(lapply(x,mode))=="numeric" &

unlist(lapply(x,class))!="factor"],f)))}

# create a matrix of data, means, and standard deviations listMatrix <- cbind(Data=doSplits(df),

Means=lapply(doSplits(df),getStats,mean), SDs=lapply(doSplits(df),getStats,sd))

# function to subtract means and divide by standard deviations
transformData <- function(x) {

newdata <- x$Data

matchedNames <- match(names(x$Means),names(x$Data))
newdata[matchedNames] <-

sweep(sweep(data.matrix(x$Data[matchedNames]),2,unlist(x$Means),"-"),

2,unlist(x$SDs),"/")

return(newdata)

}

# apply to data

newDF <- lapply(as.data.frame(t(listMatrix)),transformData)

# Defind Fold function

Fold <- function(f, x, L) for(e in L) x <- f(x, e)
# Apply this to the data

finalData <- Fold(rbind,vector(),newDF)

- Sergey Goriatchev <sergeyg@gmail.com> wrote:

> Hi, fellow R users.

*>
**> I have a question about sapply and split combination.
**>
**> I have a big dataframe (40000 observations, 21 variables). First
**> variable (factor) is "date" and it is in format "8.29.97", that is, I
**> have monthly data. Second variable (also factor) has levels 1 to 6
**> (fractiles 1 to 5 and missing value with code 6). The other 19
**> variables are numeric.
**> For each month I have several hunder observations of 19 numeric and 1
**> factor.
**>
**> I am normalizing the numeric variables by dividing val1 by val2, where:
**>
**> val1: (for each month, for each numeric variable) difference between
**> mean of ith numeric variable in fractile 1, and mean of ith numeric
**> variable in fractile 5.
**>
**> val2: (for each month, for each numeric variable) standard deviation
**> for ith numeric variable.
**>
**> Basically, as far as I understand, I need to use split() function several
**> times.
**> To calculate val1 I need to use split() twice - first to split by
**> month and then split by fractile. Is this even possible to do (since
**> after first application of split() I get a list)??
**>
**> Is there a smart way to perform this normalization computation?
**>
**> My knowledge of R is not so advanced, but I need to know an efficient
**> way to perform calculations of this kind.
**>
**> Would really appreciate some help from experienced R users!
**>
**> Regards,
**> S
**>
**> --
**> Laziness is nothing more than the habit of resting before you get tired.
**> - Jules Renard (writer)
**>
**> Experience is one thing you can't get for nothing.
**> - Oscar Wilde (writer)
**>
**> When you are finished changing, you're finished.
**> - Benjamin Franklin (Diplomat)
**>
**> ______________________________________________
**> R-help@stat.math.ethz.ch mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide
**> http://www.R-project.org/posting-guide.html
**> and provide commented, minimal, self-contained, reproducible code.
*

>

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat Mar 31 11:46:44 2007

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Sat 31 Mar 2007 - 19:30:34 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*