From: gj <gawesh_at_gmail.com>

Date: Sat, 14 May 2011 17:42:30 +0100

1,34,2

1,35,7

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 14 May 2011 - 16:47:10 GMT

Date: Sat, 14 May 2011 17:42:30 +0100

Hi Jim,

Thanks very much for the code. I modified it a bit because I needed to
allocate the track ratings by userid (eg if user 1 plays track x once, he
gets rating 1, user 1 plays track y 100 times, he gets a rating 5) and not
by track (sorry if this wasn't clear in my original post).

1,1,1 1,2,10 1,3,1 1,4,1 1,5,15 1,6,4 1,7,16 1,8,6 1,9,1 1,10,1 1,11,2 1,12,2 1,13,1 1,14,6 1,15,7 1,16,13 1,17,3 1,18,2 1,19,5 1,20,2 1,21,2 1,22,6 1,23,4 1,24,1 1,25,1 1,26,16 1,27,4 1,28,1 1,29,4 1,30,4 1,31,4 1,32,1 1,33,14

1,34,2

1,35,7

*>tq
*

$'1'

20% 40% 60% 80% 100%

1 2 4 7 16

*>From this distribution I would expect to distribute ratings as follows:
*

freq: <=1 <=2 <=4 <=7 <=16

rating: 1 2 3 4 5

But my output is (the ratings are correct except at the cut off points):
"userid" "freq" "track" "rating"

1 1 1 2

1 10 2 5

1 1 3 2

1 1 4 2

1 15 5 5

1 4 6 4

1 16 7 5

1 6 8 4

1 1 9 2

1 1 10 2

1 2 11 3

1 2 12 3

1 1 13 2

1 6 14 4

1 7 15 5

1 13 16 5

1 3 17 3

1 2 18 3

1 5 19 4

1 2 20 3

1 2 21 3

1 6 22 4

1 4 23 4

1 1 24 2

1 1 25 2

1 16 26 5

1 4 27 4

1 1 28 2

1 4 29 4

1 4 30 4

1 4 31 4

1 1 32 2

1 14 33 5

1 2 34 3

1 7 35 5

This is the code:

# cheers jim holtman

x=read.csv(file="C:\\Data\\lastfm\\ratings\\play_history_3.csv",header=T,
sep=',')

# get the quantiles for each track

tq <- tapply(x$freq,x$userid,quantile,prob=c(0.2,0.4,0.6,0.8,1))
# create a matrix with the rownames as the tracks to use in the findInterval
tqm <- do.call(rbind, tq)

#now put the ratings

require(data.table)

x.dt <- data.table(x)

x.new <- x.dt[,list(freq = freq,track=track,rating =
findInterval(freq,tqm[as.character(userid[1L]),], rightmost.closed = TRUE) +
1L),by=userid]

Regards

Gawesh

On Sun, May 8, 2011 at 10:42 PM, jim holtman <jholtman_at_gmail.com> wrote:

> try this:

*>
**> > # create some data
**> > x <- data.frame(userid = paste('u', rep(1:20, each = 20), sep = '')
**> + , track = rep(1:20, 20)
**> + , freq = floor(runif(400, 10, 200))
**> + , stringsAsFactors = FALSE
**> + )
**> > # get the quantiles for each track
**> > tq <- tapply(x$freq, x$track, quantile, prob = c(.2, .4, .6, .8, 1))
**> > # create a matrix with the rownames as the tracks to use in the
**> findInterval
**> > tqm <- do.call(rbind, tq)
**> > # now put the ratings
**> > require(data.table)
**> > x.dt <- data.table(x)
**> > x.new <- x.dt[,
**> + list(userid = userid
**> + , freq = freq
**> + , rating = findInterval(freq
**> + # use track as index into
**> quantile matrix
**> + , tqm[as.character(track[1L]),]
**> + , rightmost.closed = TRUE
**> + ) + 1L
**> + )
**> + , by = track]
**> >
**> > head(x.new)
**> track userid freq rating
**> [1,] 1 u1 10 1
**> [2,] 1 u2 15 1
**> [3,] 1 u3 126 4
**> [4,] 1 u4 117 3
**> [5,] 1 u5 76 2
**> [6,] 1 u6 103 3
**> >
**>
**>
**> On Sun, May 8, 2011 at 2:48 PM, gj <gawesh_at_gmail.com> wrote:
**> > Hi,
**> >
**> > I have a mysql table with fields userid,track,frequency e.g
**> > u1,1,10
**> > u1,2,100
**> > u1,3,110
**> > u1,4,200
**> > u1,5,120
**> > u1,6,130
**> > .
**> > u2,1,23
**> > .
**> > .
**> > where "frequency" is the number of times a music track is played by a
**> > "userid"
**> >
**> > I need to turn my 'frequency' table into a rating table (it's for a
**> > recommender system). So, for each user, I need to categorise the
**> frequency
**> > of tracks played by quintile so that each particular track can have 5
**> > ratings (1-5), with the ratings allocated as follows: inter-quintile
**> range
**> > 100-80% = rating 5, inter-quintile range 80-60% = rating 4,
**> > ..., inter-quintile range 20-0% = rating 1)
**> >
**> > Hence, I need to create a table with fields userid,track,rating:
**> > u1,1,1
**> > u1,2, 3
**> > ...
**> >
**> > Can anybody help me to do this with R?
**> >
**> > Regards
**> > Gawesh
**> >
**> > [[alternative HTML version deleted]]
**> >
**> > ______________________________________________
**> > R-help_at_r-project.org mailing list
**> > https://stat.ethz.ch/mailman/listinfo/r-help
**> > PLEASE do read the posting guide
**> http://www.R-project.org/posting-guide.html
**> > and provide commented, minimal, self-contained, reproducible code.
**> >
**>
**>
**>
**> --
**> Jim Holtman
**> Data Munger Guru
**>
**> What is the problem that you are trying to solve?
**>
*

[[alternative HTML version deleted]]

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 14 May 2011 - 16:47:10 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Sat 14 May 2011 - 19:00:07 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*