From: Jorge Velez <jorgeivanvelez_at_gmail.com>

Date: Tue, 08 Apr 2008 16:56:36 -0400

# Data set

set.seed(123)

d <- data.frame(cbind(val=rnorm(1:10)^2, group=sample(LETTERS[1:5],100,repl=TRUE))) d[,"val"]<-as.numeric(as.character(d$val))

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 08 Apr 2008 - 21:00:14 GMT

Date: Tue, 08 Apr 2008 16:56:36 -0400

Hi Tania,

I think it could be. I tried a solution based on your data set using a chi-squared approach. Here is what I got:

# ----------------

# Data set

set.seed(123)

d <- data.frame(cbind(val=rnorm(1:10)^2, group=sample(LETTERS[1:5],100,repl=TRUE))) d[,"val"]<-as.numeric(as.character(d$val))

# Ranking "d" in decreasing order based on "val" and counting the number of
observation in each group

TABLE=table(d[order(val,decreasing=TRUE),][1:10,"group"])
**TABLE
**
A B C D E

3 2 3 1 1

# Chi-squared

cht=chisq.test(TABLE)

cht

Chi-squared test for given probabilities

data: TABLE

X-squared = 2, df = 4, p-value = 0.7358

cht$p.value

[1] 0.7357589

Hope this helps,

Jorge

On Tue, Apr 8, 2008 at 11:24 AM, Tania Oh <tania.oh_at_bnc.ox.ac.uk> wrote:

> Dear All,

*>
**> I do apologise if this question is out of place for this list but I've
**> tried searching mailing lists and read "Introductory Statistics with
**> R" by Peter Dalgaard, but couldn't find any hints on solving my
**> question below:
**>
**> I have a data frame (d) of values which I will rank in decreasing
**> order of "val". Each value belongs to a group, either 'A', 'B', 'C',
**> 'D', or 'E'. I then take the first 10 entries in data frame 'd' and
**> count the number of occurrences for each of the groups. I want to
**> test if certain groups occur more frequently than by chance in my
**> first 10 entries. Would a chi-square test or a hypergeometric test be
**> more suitable? If neither, what would be an alternative solution in
**> R? Below is my data:
**>
**>
**> ## data
**> L5 <- LETTERS[1:5]
**> d <- data.frame(cbind(val= rnorm(1:10)^2, group=sample(L5,100,
**> repl=TRUE)))
**>
**> str(d)
**> ##'data.frame': 100 obs. of 2 variables:
**> ##$ val : Factor w/ 10 levels "0.000169268449333046",..: 10 3 5 6 1 2
**> 7 8 4 9 ...
**> ##$ group: Factor w/ 5 levels "A","B","C","D",..: 4 4 4 5 3 1 5 2 1
**> 2 ...
**>
**>
**> Many thanks in advance and apologies again,
**> tania
**>
**> D. phil student
**> Department of Physiology, Anatomy and Genetics
**> University of Oxford
**>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide
**> http://www.R-project.org/posting-guide.html
**> and provide commented, minimal, self-contained, reproducible code.
*

[[alternative HTML version deleted]]

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 08 Apr 2008 - 21:00:14 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 08 Apr 2008 - 22:30:27 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*