Re: [R] Problem Using the %in% command

From: jim holtman <jholtman_at_gmail.com>
Date: Wed, 20 Feb 2008 08:41:11 -0500

With the format you have, we have to split out the genes separated by commas and then do 'table'. Here is one way of doing it:

> x <- readLines(textConnection(" Function x

+ Function1               gene5, gene19, gene22, gene23
+ Function2                              gene1, gene7, gene19
+ Function3                   gene2, gene3, gene7, gene23"))

> closeAllConnections()
> # funny data; split it up. get rid of header
> x <- x[-1]
> # split on blanks
> x.b <- strsplit(x, "[[:blank:]]+")
> # recombine into a 'long' format
> x.c <- lapply(x.b, function(z) cbind(z[1], unlist(strsplit(z[-1], ","))))
> x.c <- do.call(rbind, x.c)

> table(list(x.c[,1], x.c[,2]))
           .2
.1          gene1 gene19 gene2 gene22 gene23 gene3 gene5 gene7
  Function1     0      1     0      1      1     0     1     0
  Function2     1      1     0      0      0     0     0     1
  Function3     0      0     1      0      1     1     0     1

>

On 2/20/08, Paul Christoph Schröder <pschrode_at_alumni.unav.es> wrote:
> I'm sorry if I didn't wrote it the right way. I'm just starting in the world
> of R and it's not that easy at the beginning.
> I wrote it again with code and comments. I hope it is understandable now. Do
> you think I should post it again in this shape?
>
> func_gen<-read.delim(file, header=T) #contains functions (rows) and genes
> (colum); func_gen is a data.frame
>
> #It looks like this:
> # Function x
> # Function1 gene5, gene19, gene22, gene23
> # Function2 gene1, gene7, gene19
> # Function3 gene2, gene3, gene7, gene23
>
> # Duplicates of genes exist between different functions. This is why the
> "read.delim" command was used instead of the "read.table" command #because
> of "duplicate 'row.names' are not allowed" error.
>
> all_genes #contains all genes from above data frame; all_genes is a

> data.frame
> #It looks like this:
> # Genes
> # gene1
> # gene2
> # gene3
> # gene5
> # gene7
> # gene19
> # gene 22
> # gene 23
>
> func_gen[,2] %in% all_genes #this should result in a true-false matrix
> # Like this:
> # Function gene1 gene2 gene3 gene5 gene7 gene19 gene22
> gene23
> # Function1 F F F T F
> T T T
> # Function2 T F F F T
> T F F
> # Function3 F T T F T
> F F T
>
> #and instead I obtain a true-false matrix with only FALSE-values.
>
> Thanks in advance!
> Paul
>
>
> --
Paul C. Schröder
PhD-Student
Division of Proteomics, Genomics &
> Bioinformatics

Center for Applied Medicine (CIMA)
University of
> Navarra

Avda. Pio XII, 55
E-31008 Pamplona, Spain

Tel: +34 948 194700, ext
> 5023

email: pschrode_at_alumni.unav.es

>
>
> jim holtman escribió:
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
and provide
> commented, minimal, self-contained, reproducible code.

It is hard to give a
> solution if we don't have the problem statement,
or an example of the data
> structures you are using.

On Feb 20, 2008 6:57 AM, Paul Christoph
> Schröder

<pschrode_at_alumni.unav.es> wrote:

> Hello all!

I have the following problem with the %in% command:

  1. I have a
    > data frame that consists of functions (rows) and genes
    (columns). The whole
    > has been loaded with the "read.delim" command
    because of gene-duplications
    > between the different rows.
  2. Now, there is another data frame that
    > contains all the genes (only
    the genes and without duplicates) from all the
    > functions of the above
    data frame.

What I want to do now is to use the "%
> in %" command to obtain a

TRUE-FALSE data frame. This should be a data
> frame, where for every

function some genes are TRUE and some are FALSE
> depending if they were
or not in the specific function when matched against
> the "all genes"

data frame.

The main problem I have is the way how the
> genes are in the first data

frame. I used the "unlist" command to separate
> them through commas ",".
But every time I do the match between the first and
> second data frame it
returns out FALSE for every gene in every
> function.

Can anyone please give me a hind how to handle the problem? Thank
> you very much in advance!

Paul

--
Paul C. Schröder
PhD-Student
Division of

> Proteomics, Genomics & Bioinformatics
Center for Applied Medicine
> (CIMA)
University of Navarra Avda. Pio XII, 55 E-31008 Pamplona, Spain Tel:
> +34 948 194700, ext 5023
email: pschrode_at_alumni.unav.es [[alternative
> HTML version
> deleted]]
______________________________________________ R-help_at_r-project.org
> mailing
> list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
> read the posting guide
> http://www.R-project.org/posting-guide.html
and provide
> commented, minimal, self-contained, reproducible code.

>

>
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Received on Wed 20 Feb 2008 - 13:55:03 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 20 Feb 2008 - 14:00:16 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive