Re: [R] resampling from distributions

From: Grant Gillis <grant.j.gillis_at_gmail.com>
Date: Sat, 19 Apr 2008 13:37:36 -0700

I am sorry for the incorrect subject. My subject autofilled without my noticing in time. I suppose a better subject would be Calculating proportion of shared occurances and randomizations.

Grant

2008/4/19 Grant Gillis <grant.j.gillis_at_gmail.com>:

> Hello All,
>
> Once again thanks for all of the help to date. I am climbing my R
> learning curve. I've got a few more questions that I hope I can get some
> guidance on though. I am not sure whether the etiquette is to break up
> multiple questions or not but I'll keep them together here for now as it may
> help put the questions in context despite the fact that the post may get a
> little long.
>
>
> Question 1:
>
>
> My first goal is to calculate the proportion of shared 1) behaviours and
> 2) alleles between numerous individuals. Pasted below ('propshared'
> function) is what I have now and and works very well for calculating the
> proportion of shared behaviours where the data is formatted with each column
> as a behaviour and each row an individual. Microsatellite genotypes are
> formatted differently. An example is below. Each row is an individual and
> each column is one allele from a single locus. From the below values L1
> and L1.1 each give a copy of an allele for same locus. Occasionally values
> from different loci will have the same value altough these are not actually
> the same allele.
>
> I would like the calculation of the proportion of shared values for
> alleles to be restricted to the proportion of shared alleles within loci for
> all individuals (pairs of columns L1 and L1.1, L2 and L2.2....) What I have
> now calculates the proportion of shared values for alleles across loci. A
> specific example is that I would like the value *2* for individual *w *at
> *L1* to be considered the same as the value* 2* for individual *y* at *
> L1.1* but not the same as the value *2* for any other individual within
> any other pair of columns.
>
>
> genos<- data.frame(
>
> L1 = c(2,NA,1,3),
> L1 = c(1,NA,2,3),
> L2 = c(5,2,5,3),
> L2 = c(3,4,2,4),
> L3 = c(4,5,7,2),
> L3 = c(4,6,6,6) )
>
> rownames(genos) = c("w","x","y","z")
>
> > genos
> L1 L1.1 L2 L2.1 L3 L3.1
> w 2 1 5 3 4 4
> x NA NA 2 4 5 6
> y 1 2 5 2 7 6
> z 3 3 3 4 2 6
>
>
>
> propshared<-function(genos){
>
> sapply( rownames(genos), function(ind1)
> sapply( rownames(genos), function(ind2)
> (sum( genos[ind1,] == genos[ind2,],na.rm=TRUE )))
> /length(genos[1,]))->x
> is.na(diag(x))<-TRUE
> x
>
> }
>
> > propshared(genos)
> w x y z
> w NA 0.0000000 0.1666667 0.1666667
> x 0.0000000 NA 0.1666667 0.3333333
> y 0.1666667 0.1666667 NA 0.3333333
> z 0.1666667 0.3333333 0.3333333 NA
>
>
> The matrix I would like to have would look like this.
> w x y
> z
> w NA 0 0.333333333 0.166666667
> x 0 NA 0.166666667
> 0.166666667
> y 0.333333333 0.166666667 NA 0.166666667
> z 0.166666667 0.166666667 0.166666667 NA
>
>
> Question 2: Thanks if you have made it this far..........Next I would
> like to calculate a randomized value of the mean proportion of shared
> alleles. To do this I thought I would randomize the original data (genos
> above say 1000 times ), recalculate the proportion of shared alleles at each
> step and then take the mean (my attempt below). When I do this I get the
> same mean proportion of shared alleles (or behaviours) as the original for
> every randomization. I assume that this is due to some property of
> permuting this type of data that I do not know. Does anyone have a
> recommendation as to how I might get a value of the proportion of shared
> alleles if alleles were distributed (again within loci) at random?
>
>
> randomize <- function(genos){
> x <- apply(genos, 2, sample)
> rownames(x) <- rownames(genos)
> x
> }
>
>
> allele.permute<-function(genos, n){
>
> list<-replicate(n,randomize(genos), simplify = FALSE)
> sapply(list, propshared, simplify = FALSE)
> }
>
>
>
>
>
>
> I hope this is clear. I appreciate all insights and input
> Thanks
>
> Grant
>
>
>
>

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 19 Apr 2008 - 20:40:07 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 19 Apr 2008 - 22:30:30 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive