Re: [R] how to check if a variable is preferentially present in a sample

From: Jorge Velez <jorgeivanvelez_at_gmail.com>
Date: Tue, 08 Apr 2008 16:56:36 -0400

Hi Tania,

I think it could be. I tried a solution based on your data set using a chi-squared approach. Here is what I got:

# ----------------

# Data set
set.seed(123)
d <- data.frame(cbind(val=rnorm(1:10)^2, group=sample(LETTERS[1:5],100,repl=TRUE))) d[,"val"]<-as.numeric(as.character(d$val))

# Ranking "d" in decreasing order based on "val" and counting the number of observation in each group
TABLE=table(d[order(val,decreasing=TRUE),][1:10,"group"]) TABLE A B C D E
3 2 3 1 1

# Chi-squared
cht=chisq.test(TABLE)
cht

Chi-squared test for given probabilities

data: TABLE
X-squared = 2, df = 4, p-value = 0.7358

cht$p.value
[1] 0.7357589

Hope this helps,

Jorge

On Tue, Apr 8, 2008 at 11:24 AM, Tania Oh <tania.oh_at_bnc.ox.ac.uk> wrote:

> Dear All,
>
> I do apologise if this question is out of place for this list but I've
> tried searching mailing lists and read "Introductory Statistics with
> R" by Peter Dalgaard, but couldn't find any hints on solving my
> question below:
>
> I have a data frame (d) of values which I will rank in decreasing
> order of "val". Each value belongs to a group, either 'A', 'B', 'C',
> 'D', or 'E'. I then take the first 10 entries in data frame 'd' and
> count the number of occurrences for each of the groups. I want to
> test if certain groups occur more frequently than by chance in my
> first 10 entries. Would a chi-square test or a hypergeometric test be
> more suitable? If neither, what would be an alternative solution in
> R? Below is my data:
>
>
> ## data
> L5 <- LETTERS[1:5]
> d <- data.frame(cbind(val= rnorm(1:10)^2, group=sample(L5,100,
> repl=TRUE)))
>
> str(d)
> ##'data.frame': 100 obs. of 2 variables:
> ##$ val : Factor w/ 10 levels "0.000169268449333046",..: 10 3 5 6 1 2
> 7 8 4 9 ...
> ##$ group: Factor w/ 5 levels "A","B","C","D",..: 4 4 4 5 3 1 5 2 1
> 2 ...
>
>
> Many thanks in advance and apologies again,
> tania
>
> D. phil student
> Department of Physiology, Anatomy and Genetics
> University of Oxford
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 08 Apr 2008 - 21:00:14 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 08 Apr 2008 - 22:30:27 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive