Re: [R] .gct file

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Wed 20 Jul 2005 - 18:52:51 EST

On Tue, 19 Jul 2005, Marc Schwartz (via MN) wrote:

> For the TAB delimited columns, adjust the 'sep' argument to:
>
> read.table("data.gct", skip = 2, header = TRUE, sep = "\t")
>
> The 'quote' argument is by default:
>
> quote = "\"'"
>
> which should take care of the quoted strings and bring them in as a
> single value.
>
> The above presumes that the header row is also TAB delimited. If not,
> you may have to set 'skip = 3' to skip over the header row and manually
> set the column names.

Not quite. You can open a connection, skip 2 rows and read one to get the column names, then read the rest of the file using read.table on the open connection using the column names you just read.

However, based on what we have been shown

read.table("data.gct", skip = 2, header = TRUE)

ought to work as the file looks as if it is white-space delimited (a tab is white space).

>
> HTH,
>
> Marc Schwartz
>
>
> On Tue, 2005-07-19 at 13:52 -0400, mark salsburg wrote:
>> This is all extremely helpful.
>>
>> The data turns out is a little atypical, the columns are tab-delemited
>> except for the description columns
>>
>>
>> DATA1.gct looks like this
>>
>> #1.2
>> 23 3423
>> NAME DESCRIPTION VALUE
>> gene1 "a protein inducer" 1123
>> ..... ................. ......
>>
>> How do I get R to read the data as tab delemited, but read in the 2nd
>> coloumn as one value based on the quotation marks..
>>
>> thanks..
>>
>> On 7/19/05, Marc Schwartz (via MN) <mschwartz@mn.rr.com> wrote:
>>> On Tue, 2005-07-19 at 13:16 -0400, mark salsburg wrote:
>>>> ok so the gct file looks like this:
>>>>
>>>> #1.2 (version number)
>>>> 7283 19 (matrix size)
>>>> Name Description Values
>>>> .... ....... ......
>>>>
>>>> How can I tell R to disregard the first two lines and start reading
>>>> the 3rd line in this gct file. I would just delete them, but I do not
>>>> know how to open a gct. file
>>>>
>>>> thank you
>>>>
>>>> On 7/19/05, Duncan Murdoch <murdoch@stats.uwo.ca> wrote:
>>>>> On 7/19/2005 12:10 PM, mark salsburg wrote:
>>>>>> I have two files to compare, one is a regular txt file that I can read
>>>>>> in no prob.
>>>>>>
>>>>>> The other is a .gct file (How do I read in this one?)
>>>>>>
>>>>>> I tried a simple
>>>>>>
>>>>>> read.table("data.gct", header = T)
>>>>>>
>>>>>> How do you suggest reading in this file??
>>>>>>
>>>>>
>>>>> .gct is not a standard filename extension. You need to know what is in
>>>>> that file. Where did you get it? What program created it?
>>>>>
>>>>> Chances are the easiest thing to do is to get the program that created
>>>>> it to export in a well known format, e.g. .csv.
>>>>>
>>>>> Duncan Murdoch
>>>
>>>
>>> The above would be consistent with the info in my reply.
>>>
>>> I guess if the format is consistent, as per Mark's example above, you
>>> can use:
>>>
>>> read.table("data.gct", skip = 2, header = TRUE)
>>>
>>> which will start by skipping the first two lines and then reading in the
>>> header row and then the data.
>>>
>>> See ?read.table
>>>
>>> HTH,
>>>
>>> Marc Schwartz
>>>
>>>
>>>
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Wed Jul 20 19:01:27 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:51 EST