From: Laszlo <Laszlo.Bodnar_at_erstebank.hu>

Date: Wed, 16 Mar 2011 09:35:49 -0700 (PDT)

df

Date: Wed, 16 Mar 2011 09:35:49 -0700 (PDT)

Hello Ivan,

Thank you very much for your comments, they were really useful and I’ll try to memorize and use them in the future.

Getting back to my problem… well, I try to put it in a different way because I’m afraid this is gonna be a little bit more difficult than I thought.

So, here is my refreshed database (it is a little bit more similar to my original database than my previous ’df’ database in my previous letter, although still simplified).

id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)

a <-c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3) b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2) c <-c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2) d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2) e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,3)df <-data.frame(id,a,b,c,d,e)

df

Basically what I would like to do is to get the distributions of the numbers for each column (a,b,c,d,e) and for each group (1,2,3) (for this latter grouping see my column ’id’).

So, for column ’a’ and for number ’1’ (for the latter see column ’id’): as.numeric(table(df[1:10,2]))[1]/sum(as.numeric(table(df[1:10,2]))) as.numeric(table(df[1:10,2]))[2]/sum(as.numeric(table(df[1:10,2])))

Fist time you get: [1] 0.3, then you get: [1] 0.7

Just to briefly explain my results: in column ’a’ (and regarding only those
records which have number ’1’ in column ’id’) we can say that:
number 1 occured 3 times, and

number 3 occured 7 times.

3 / (3+7) = 0.3, and 7 / (3+7) = 0.7

Again, just to show you another example. For column ’a’ and for number ’2’

(for the latter grouping see again column ’id’):

as.numeric(table(df[11:20,2]))[1]/sum(as.numeric(table(df[11:20,2]))) as.numeric(table(df[11:20,2]))[2]/sum(as.numeric(table(df[11:20,2]))) as.numeric(table(df[11:20,2]))[3]/sum(as.numeric(table(df[11:20,2])))

After running the codes the results are: 0.4, 0.3, 0.3.

Let me explain a little again: in column ’a’ and regarding only those
observations which have number ’2’ in column ’id’) we can say that
Number 1 occured 4 times

number 2 occured 3 times and

number 3 occured 3 times.

Now the results are obvious: 4/10 = 0.4, 3/10=0.3, 3/10=0.3 etc.

So this is what I would like to do. Calculating distributions for each "custom-defined" subsets and then collecting these values into a data frame.

The reason I wanted to sort out the problem with indices like ’i’, ’k’ etc.

(you know we discussed it previously) was because I’m gonna have to change

the input ’df’ dataframe on a regular basis and hence both the overall
number of rows and columns might change over time…

Thank you again,

Laszlo

-- View this message in context: http://r.789695.n4.nabble.com/changing-one-character-in-the-name-of-dataframes-repeatedly-tp3348390p3382288.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Wed 16 Mar 2011 - 17:29:22 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Wed 16 Mar 2011 - 17:30:22 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*