Re: [R] Problem reading from a data frame

From: Marc Schwartz <marc_schwartz_at_comcast.net>
Date: Wed, 02 Jul 2008 11:52:25 -0500

Not likely the factor issue:

x <- factor(c("MT2342", "MT0982", "MT2874"))

> x

[1] MT2342 MT0982 MT2874
Levels: MT0982 MT2342 MT2874

> gsub("[^0-9]", "", x)
[1] "2342" "0982" "2874"

gsub() and friends coerce to character internally already:

> gsub

function (pattern, replacement, x, ignore.case = FALSE, extended = TRUE,

     perl = FALSE, fixed = FALSE, useBytes = FALSE) {

     if (!is.character(x))
         x <- as.character(x)
     .Internal(gsub(as.character(pattern), as.character(replacement),
         x, ignore.case, extended, perl, fixed, useBytes))
}
<environment: namespace:base>

More than likely what is happening is that 'PthwyGenes' is a single row data frame:

x <- data.frame(A = "MT2342", B = "MT0982", C = "MT2874")

> x

        A B C
1 MT2342 MT0982 MT2874

> str(x)

'data.frame': 1 obs. of 3 variables:

  $ A: Factor w/ 1 level "MT2342": 1
  $ B: Factor w/ 1 level "MT0982": 1
  $ C: Factor w/ 1 level "MT2874": 1


Thus, when the code for gsub() attempts to coerce 'x' to character, as per documented behavior, you get the factor level numeric codes coerced to character:

> as.character(x[1, ])
[1] "1" "1" "1"

and then:

> gsub("[^0-9]", "", x[1, ])
[1] "1" "1" "1"

Thus, instead use:

> sapply(x[1, ], function(x) gsub("[^0-9]", "", x))

      A B C
"2342" "0982" "2874"

or, if you just need the vector returned and not a data frame:

> gsub("[^0-9]", "", unlist(x[1, ]))

[1] "2342" "0982" "2874"

The key thing to remember is that a single extracted row in a data frame is not a vector.

HTH, Marc Schwartz

on 07/02/2008 10:51 AM jim holtman wrote:

> Seems to work fine for me:
> 
>> x <- c("MT2342",    "MT0982",    "MT2874")
>> gsub("[^0-9]", "", x)
> [1] "2342" "0982" "2874"
> 
> You might have 'factors' so you should use as.character to convert to
> character strings:
> 
> gsub('[^0-9]','',as.character(PthwyGenes))
> 
> On Wed, Jul 2, 2008 at 10:24 AM,  <naw3_at_duke.edu> wrote:
>> Hi,
>>
>> I have a data frame with strings that have two letters and four numbers. When I
>> store a whole row as a new vector and try to remove the preceding letters using
>> the gsub command, it returns characters of single numbers that have no relation
>> to the numbers in each string. I also noticed that when I view the new vector
>> before using gsub, it includes the original headers from the data frame. For
>> example,
>>
>> The original row will contain (i'm not showing the headers):
>>
>> MT2342    MT0982    MT2874
>>
>> and after I use the command, 'gsub('[^0-9]','',PthwyGenes),' I get:
>>
>> "6"    "6"    "8"
>>
>> and this result no longer has any headers.
>>
>> Does anyone know why this happens and how I can fix it?
>>
>> Thanks,
>> -Nina

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 02 Jul 2008 - 17:01:01 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 02 Jul 2008 - 17:31:50 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive