Re: [R] help with mysql and R: partitioning by quintile

From: jim holtman <jholtman_at_gmail.com>
Date: Sun, 08 May 2011 17:42:06 -0400

try this:

> # create some data
> x <- data.frame(userid = paste('u', rep(1:20, each = 20), sep = '')

+               , track = rep(1:20, 20)
+               , freq = floor(runif(400, 10, 200))
+               , stringsAsFactors = FALSE
+               )

> # get the quantiles for each track
> tq <- tapply(x$freq, x$track, quantile, prob = c(.2, .4, .6, .8, 1))
> # create a matrix with the rownames as the tracks to use in the findInterval
> tqm <- do.call(rbind, tq)
> # now put the ratings
> require(data.table)
> x.dt <- data.table(x)
> x.new <- x.dt[,
+               list(userid = userid
+                  , freq = freq
+                  , rating = findInterval(freq
+                                        # use track as index into
quantile matrix
+                                        , tqm[as.character(track[1L]),]
+                                        , rightmost.closed = TRUE
+                                        ) + 1L
+                  )
+              , by = track]

>
> head(x.new)
     track userid freq rating
[1,]     1     u1   10      1
[2,]     1     u2   15      1
[3,]     1     u3  126      4
[4,]     1     u4  117      3
[5,]     1     u5   76      2
[6,]     1     u6  103      3

>

On Sun, May 8, 2011 at 2:48 PM, gj <gawesh_at_gmail.com> wrote:
> Hi,

>

> I have a mysql table with fields userid,track,frequency e.g
> u1,1,10
> u1,2,100
> u1,3,110
> u1,4,200
> u1,5,120
> u1,6,130
> .
> u2,1,23
> .
> .
> where "frequency" is the number of times a music track is played by a
> "userid"
>

> I need to turn my 'frequency' table into a rating table (it's for a
> recommender system). So, for each user, I need to categorise the frequency
> of tracks played by quintile so that each particular track can have 5
> ratings (1-5), with the ratings allocated as follows: inter-quintile range
> 100-80% = rating 5,   inter-quintile range 80-60% = rating 4,
> ..., inter-quintile range 20-0% = rating 1)
>

> Hence, I need to create a table with fields userid,track,rating:
> u1,1,1
> u1,2, 3
> ...
>

> Can anybody help me to do this with R?
>

> Regards
> Gawesh
>

>        [[alternative HTML version deleted]]
>

> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Sun 08 May 2011 - 21:49:08 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 14 May 2011 - 17:20:07 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive