Re: [R] Separator with " | " for read.table

From: jim holtman <jholtman_at_gmail.com>
Date: Sun, 15 Jun 2008 21:39:25 -0400

I am not exactly sure what you are after, but if you are just printing out a single column, then unless you use "drop=FALSE" in referencing it, it is a vector:

> x <- read.table(textConnection("#GDS_ID GENE_NAME GENE_DESCRIPTION GENE_FUNCTION
+ 1007_s_at | DDR1 | discoidin domain receptor tyrosine kinase 1 | protein-coding
+ 1053_at | RFC2 | replication factor C (activator 1) 2, 40kDa | protein-coding + 117_at | HSPA6 | heat shock 70kDa protein 6 (HSP70B') | protein-coding"), sep="|", quote='')
> closeAllConnections()
> str(x)

'data.frame': 3 obs. of 4 variables:

 $ V1: Factor w/ 3 levels "1007_s_at ","1053_at ",..: 1 2 3
 $ V2: Factor w/ 3 levels " DDR1 "," HSPA6 ",..: 1 3 2
 $ V3: Factor w/ 3 levels " discoidin domain receptor tyrosine kinase
1 ",..: 1 3 2
 $ V4: Factor w/ 1 level " protein-coding": 1 1 1
> print(x$V3)
[1] discoidin domain receptor tyrosine kinase 1 replication factor C (activator 1) 2, 40kDa
[3] heat shock 70kDa protein 6 (HSP70B') 3 Levels: discoidin domain receptor tyrosine kinase 1 ... replication factor C (activator 1) 2, 40kDa
> x$V3

[1] discoidin domain receptor tyrosine kinase 1 replication factor C (activator 1) 2, 40kDa
[3] heat shock 70kDa protein 6 (HSP70B') 3 Levels: discoidin domain receptor tyrosine kinase 1 ... replication factor C (activator 1) 2, 40kDa
> x[, "V3", drop=FALSE] # is this what you were expecting
                                             V3
1 discoidin domain receptor tyrosine kinase 1 2 replication factor C (activator 1) 2, 40kDa 3 heat shock 70kDa protein 6 (HSP70B')
>

On Sun, Jun 15, 2008 at 9:31 PM, Gundala Viswanath <gundalav_at_gmail.com> wrote:
> Thanks so much Jim,
>
> It works. However how come the "\n" was not removed.
> Meaning when I do:
>
> print (x$V3)
>
> it gives something like this:
> __OUTPUT__
> [1] discoidin domain receptor tyrosine kinase 1
>
> [2] replication factor C (activator 1) 2, 40kDa
>
> [3] heat shock 70kDa protein 6 (HSP70B')
>
> __END__
>
> Note the spacing between the entries. I expect something like:
>
> [1] discoidin domain receptor tyrosine kinase 1
> [2] replication factor C (activator 1) 2, 40kDa
> [3] heat shock 70kDa protein 6 (HSP70B')
> __END__
>
> Do you have any idea how to fix this?
>
>
>
> - Gundala Viswanath
> Jakarta - Indonesia
>
>
> On Mon, Jun 16, 2008 at 10:19 AM, jim holtman <jholtman_at_gmail.com> wrote:
>> Does this give you what you want:
>>
>>> x <- read.table(textConnection("#GDS_ID GENE_NAME GENE_DESCRIPTION GENE_FUNCTION
>> + 1007_s_at | DDR1 | discoidin domain receptor tyrosine kinase 1 |
>> protein-coding
>> + 1053_at | RFC2 | replication factor C (activator 1) 2, 40kDa | protein-coding
>> + 117_at | HSPA6 | heat shock 70kDa protein 6 (HSP70B') |
>> protein-coding"), sep="|", quote='')
>>> closeAllConnections()
>>>
>>> x
>> V1 V2 V3
>> V4
>> 1 1007_s_at DDR1 discoidin domain receptor tyrosine kinase 1
>> protein-coding
>> 2 1053_at RFC2 replication factor C (activator 1) 2, 40kDa
>> protein-coding
>> 3 117_at HSPA6 heat shock 70kDa protein 6 (HSP70B')
>> protein-coding
>>>
>>
>>
>> You had a quote(') in your data; you need to have quote='' in the read.table.
>>
>> On Sun, Jun 15, 2008 at 9:11 PM, Gundala Viswanath <gundalav_at_gmail.com> wrote:
>>> Hi,
>>>
>>> I have the following data file to be parsed and captured as a data frame:
>>>
>>> __DATA__
>>> #GDS_ID GENE_NAME GENE_DESCRIPTION GENE_FUNCTION
>>> 1007_s_at | DDR1 | discoidin domain receptor tyrosine kinase 1 | protein-coding
>>> 1053_at | RFC2 | replication factor C (activator 1) 2, 40kDa | protein-coding
>>> 117_at | HSPA6 | heat shock 70kDa protein 6 (HSP70B') | protein-coding
>>>
>>> __END__
>>>
>>> In particular it is separated by " | " , namely - space, bar, space.
>>> However I tried this without avail:
>>>
>>> geneinfo <- read.table("mydata.txt", sep=" | ", comment.char="\#")
>>> print(geneinfo)
>>>
>>> I also tried with sep= "|", it gave a wrong parsing. Please advice.
>>>
>>> - Gundala Viswanath
>>> Jakarta - Indonesia
>>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem you are trying to solve?
>>
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 16 Jun 2008 - 01:42:10 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 16 Jun 2008 - 02:30:45 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive