Re: [R] calculating the occurrences of distinct observations in the subsets of a dataframe

From: Tůth Dťnes <tdenes_at_cogpsyphy.hu>
Date: Thu, 17 Mar 2011 12:43:06 +0100 (CET)

Hi!

Sorry, I made an error in the previous e-mail. So try this:
by(df[,-1],df$id,function(x) apply(x,2,tabulate))

This gives you a list. You can rearrange it into a data frame or a 3d array if you wish.

Regards,
  Denes

> Hello everybody,
>
> I have a data frame in R which is similar to the follows. Actually my real
> 'df' dataframe is much bigger than this one here but I really do not want
> to confuse anybody so that is why I try to simplify things as much as
> possible.
>
> So here's the data frame.
>
> id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
> a <-c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
> b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
> c <-c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
> d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
> e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,3)
> df <-data.frame(id,a,b,c,d,e)
> df
>
> Basically what I would like to do is to get the occurrences of numbers for
> each column (a,b,c,d,e) and for each id group (1,2,3) (for this latter
> grouping see my column 'id').
>
> So, for column 'a' and for id number '1' (for the latter see column 'id')
> the code would be something like this:
> as.numeric(table(df[1:10,2]))
>
> The results are:
> [1] 3 7
>
> Just to briefly explain my results: in column 'a' (and regarding only
> those records which have number '1' in column 'id') we can say that:
> number 1 occured 3 times, and
> number 3 occured 7 times.
>
> Again, just to show you another example. For column 'a' and for id number
> '2' (for the latter grouping see again column 'id'):
> as.numeric(table(df[11:20,2]))
>
> After running the codes the results are: [1] 4 3 3
>
> Let me explain a little again: in column 'a' and regarding only those
> observations which have number '2' in column 'id') we can say that
> number 1 occured 4 times
> number 2 occured 3 times and
> number 3 occured 3 times.
>
> Last example: for column 'e' and for id number '3' the code would be:
> as.numeric(table(df[21:30,6]))
>
> With the results:
> [1] 1 4 5
>
> ...meaning that number '1' occured once, number '2' occured four times and
> number '3' occured 5 times.
>
> So this is what I would like to do. Calculating the occurrences of numbers
> for each custom-defined subsets (and then collecting these values into a
> data frame). I know it is NOT a difficult task but the PROBLEM is that I'm
> gonna have to change the input 'df' dataframe on a regular basis and hence
> both the overall number of rows and columns might CHANGE over time...
>
> What I have done so far is that I have separated the 'df' dataframe by
> columns, like this:
> for (z in (2:ncol(df))) assign(paste("df",z,sep="."),df[,z])
>
> So df.2 will refer to df$a, df.3 will equal df$b, df.4 will equal df$c
> etc. But I'm really stuck now and I don't know how to move forward, you
> know, getting the occurrences for each column and each group of ids.
>
> Do you have any ideas?
> Best regards,
> Laszlo
>
> ____________________________________________________________________________________________________
> Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos
> és/vagy jogilag, szakmailag vagy más módon védett információt
> tartalmazhat. Amennyiben nem √–n a lev√©l c√≠mzettje akkor a lev√©l
> tartalmának közlése, reprodukálása, másolása, vagy egyéb más
> √ļton t√∂rt√©nŇ‘ terjeszt√©se, felhaszn√°l√°sa szigor√ļan tilos.
> Amennyiben t√©ved√©sbŇ‘l kapta meg ezt az √ľzenetet k√©rj√ľk azonnal
> √©rtes√≠tse az √ľzenet k√ľldŇ‘j√©t. Az Erste Bank Hungary Zrt. (EBH) nem
> v√°llal felelŇ‘ss√©get az inform√°ci√≥ teljes √©s pontos - c√≠mzett(ek)hez
> t√∂rt√©nŇ‘ - eljuttat√°s√°√©rt, valamint semmilyen k√©s√©s√©rt, kapcsolat
> megszakad√°sb√≥l eredŇ‘ hib√°√©rt, vagy az inform√°ci√≥
> felhaszn√°l√°s√°b√≥l vagy annak megb√≠zhatatlans√°g√°b√≥l eredŇ‘
> kárért.
>
> Az √ľzenetek EBH-n k√≠v√ľli k√ľldŇ‘je vagy c√≠mzettje tudom√°sul veszi √©s
> hozz√°j√°rul, hogy az √ľzenetekhez m√°s banki alkalmazott is hozz√°f√©rhet
> az EBH folytonos munkamenetének biztosítása érdekében.
>
>
> This e-mail and any attached files are confidential and/...{{dropped:19}}
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 17 Mar 2011 - 11:48:06 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 17 Mar 2011 - 11:50:22 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive