Re: [R] replacing a factor value in a data frame

From: Dave Roberts <droberts_at_montana.edu>
Date: Sat 29 Oct 2005 - 02:01:14 EST

Federico,

     There doesn't appear to be an instance of the value you want to change in your example, so I had to improvise. Part of the problem may be that the dataframe is composed of factors, and it's not possible to convert the value of a factor to another value that's in the set of possible values, given by the levels() function. So, if you want to change GC to CG, but CG does not already exist in the set of possible values you'll have to add it. E.g.

 > tmp <- data
 > levels(tmp[,30]) <- c(levels(data[,30]),'CG')

then, if the problem only occurs in one column it's an easy fix.

 > tmp[data=='GC'] <- 'CG'

If GC occurs in multiple columns you'll either have to change the levels for each column as I did just above, or work with a single column. Since you don't have 30 columns in your example, let's pretend you want to change all the instances of 'CC' in data$V5 to 'XX'

 > tmp <- data
 > levels(tmp$V5) <- c(levels(data$V5),'XX')
 > tmp$V5[data$V5=='CC'] <- 'XX'
 > tmp

    V4 V5 V6 V7 V8 V9 V10
1 TT GG TT AC AG AG TT
2 AT XX TT AA AA AA TT
3 AT XX TT AC AA <NA> TT
4 TT XX TT AA AA AA TT
5 AT CG TT CC AA AA TT
6 TT XX TT AA AA AA TT
7 AT XX TT CC <NA> <NA> TT
8 TT XX TT AC AG AG TT
9 AT XX TT CC AG <NA> TT
10 TT XX TT CC GG GG TT

Notice that the instances of 'CC' in tmp$V7 did not change.

HTH, Dave Roberts

Federico Calboli wrote:
> Hi All,
>
> I have the following problem, that's driving me mad.
>
> I have a dataframe of factors, from a genetic scan of SNPs. I DO have
> NAs in the dataframe, which would look like:
>
> V4 V5 V6 V7 V8 V9 V10
> 1 TT GG TT AC AG AG TT
> 2 AT CC TT AA AA AA TT
> 3 AT CC TT AC AA <NA> TT
> 4 TT CC TT AA AA AA TT
> 5 AT CG TT CC AA AA TT
> 6 TT CC TT AA AA AA TT
> 7 AT CC TT CC <NA> <NA> TT
> 8 TT CC TT AC AG AG TT
> 9 AT CC TT CC AG <NA> TT
> 10 TT CC TT CC GG GG TT
>
>
> In the dataframe I have 1 column where one factor has been erroneosly
> given alternative readings: CG and GC.
>
> I want to change the instances of GC to CG and I use the code:
>
> data[data[,30]=="GC", 30] = "CG"
>
> but get the error:
> Error in "[<-.data.frame"(`*tmp*`, all[, 30] == "GC", 30
> missing values are not allowed in subscripted as
>
> Any hints?
>
> Cheers,
>
> Federico
>

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
David W. Roberts                                     office 406-994-4548
Professor and Head                                      FAX 406-994-3190
Department of Ecology                         email droberts@montana.edu
Montana State University
Bozeman, MT 59717-3460

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Sat Oct 29 02:25:12 2005

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 28 Apr 2006 - 12:09:50 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.