Re: [R] aggregate by part of a field

From: Henrique Dallazuanna <wwwhsd_at_gmail.com>
Date: Thu, 10 Mar 2011 15:04:26 -0300

Try this:

rowsum(a$sales, gsub("(\\w*\\s\\w*\\s\\w*).*", "\\1", a$product))

On Thu, Mar 10, 2011 at 12:20 PM, Hui Du <Hui.Du_at_dataventures.com> wrote:

>
>
> Thank you for your reply. I didn't state my problem very clearly in my
> previous post. The data could look like
>
> a = data.frame(date = c(20081201, 20081202, 20081201), product = c("a b c d
> e", "a bdfd c g h t", "def e h a c e h g"), sales = c(1, 2, 3)). The first
> three items in "product" are the key, like "a, b c", "a bdfd c" and "def e
> h" in my example. It is not necessary just three letters, so substr may not
> work.
>
> Sorry for the confusion.
>
> HXD
>
>
> From: Dennis Murphy [mailto:djmuser_at_gmail.com]
> Sent: Wednesday, March 09, 2011 11:30 PM
> To: Hui Du
> Cc: r-help_at_r-project.org
> Subject: Re: [R] aggregate by part of a field
>
> Hi:
>
> Here's one approach, although I imagine there are more efficient ways.
>
> # A function to strip spaces and return the first three non-blank elements
> of a string
> keyset <- function(x) substr(gsub(' ', '', x)[1], 1, 3)

>
> # Apply the function to the data frame to generate the key:
> a$key <- sapply(a$product, keyset)
> > a
> date product sales key
> 1 20081201 a b c d e 1 abc
> 2 20081202 a b c g h t 2 abc
> 3 20081201 d e h a c e h g 3 deh
>
> # Use aggregate to sum sales by key:
> aggregate(sales ~ key, data = a, FUN = sum)
> key sales
> 1 abc 3
> 2 deh 3
>
> HTH,
> Dennis
> On Wed, Mar 9, 2011 at 6:02 PM, Hui Du <Hui.Du_at_dataventures.com<mailto:
> Hui.Du_at_dataventures.com>> wrote:
>
> Hi All,
>
> I have a data frame like
>
> a = data.frame(date = c(20081201, 20081202, 20081201), product = c("a b c d
> e", "a b c g h t", "d e h a c e h g"), sales = c(1, 2, 3))
>
> Now I want to aggregate the sales by part of the a$product.
> 'Product' is the product name, a string separated by a space. The key in my
> aggregate function is first three items in "product" field. In my example,
> the key is "a b c", "a b c" and "d e h", respectively. Do you know how to do
> it? I thought an awkward way which needed several function calls (like
> strsplit, lapply, paste etc) to manipulate the string in 'product' field. I
> guess there could be some more elegant way to do it.
>
> Thanks in advance.
>
>
> HXD
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org<mailto:R-help_at_r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

	[[alternative HTML version deleted]]


______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Received on Thu 10 Mar 2011 - 18:12:23 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 10 Mar 2011 - 18:20:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive