From: Gavin Simpson <gavin.simpson_at_ucl.ac.uk>

Date: Wed, 30 Jan 2008 16:22:57 +0000

*> temp2 <- temp
*

> temp2$GDP <- as.factor(temp2$GDP)

> data.matrix(temp)

On Wed, 2008-01-30 at 07:53 -0800, Arthur Steinmetz wrote:

> I don't understand this behavior. Why does the every data point get

*> trashed by data.matrix when there is one non-numeric element in the
**> array? Thanks.
*

I suspect it is because your data GDP variable is not what you think it is. What does str(temp) say about GDP? I'll guess it says something like this:

> str(temp)

'data.frame': 4 obs. of 2 variables:

$ GDP : Factor w/ 4 levels "2042.4","2052.5",..: 4 3 2 1
$ CPIYOY: Factor w/ 4 levels "0.8","0.9","1.1",..: 4 2 1 3

which indicates that GDP is a factor.

As this shows, if GDP is numeric then data.matrix does produce what you want. If GDP is a factor however, you get the behaviour you observe.

> temp <- data.frame(GDP = c(2098.1, 2085.4, 2052.5, 2042.4), CPIYOY =

c("garbage", "0.9", "0.8", "1.1"))

> str(temp)

'data.frame': 4 obs. of 2 variables:

$ GDP : num 2098 2085 2052 2042

$ CPIYOY: Factor w/ 4 levels "0.8","0.9","1.1",..: 4 2 1 3

*> temp
*

** GDP CPIYOY
**

1 2098.1 garbage 2 2085.4 0.9 3 2052.5 0.8 4 2042.4 1.1

> data.matrix(temp)

GDP CPIYOY [1,] 2098.1 4 [2,] 2085.4 2 [3,] 2052.5 1 [4,] 2042.4 3

> temp2$GDP <- as.factor(temp2$GDP)

> data.matrix(temp)

GDP CPIYOY [1,] 4 4 [2,] 3 2 [3,] 2 1 [4,] 1 3

One option could be to convert anything in $CPIYOY that is "garbage" to NA, and having made sure that temp$GDP is numeric and not a factor, then use data.matrix, which will now do what you want.

> temp$CPIYOY[temp$CPIYOY == "garbage"] <- NA

*> temp
*

** GDP CPIYOY
**

1 2098.1 <NA> 2 2085.4 0.9 3 2052.5 0.8 4 2042.4 1.1

> data.matrix(temp)

GDP CPIYOY [1,] 2098.1 NA [2,] 2085.4 2 [3,] 2052.5 1 [4,] 2042.4 3

If temp$GDP is a factor, you can't just do as.numeric(temp$GDP) as this will result in the same behaviour as data.matrix. You need to convert to character then to numeric:

> temp2$GDP <- as.numeric(as.character(temp2$GDP))

> temp2

** GDP CPIYOY
**

1 2098.1 <NA> 2 2085.4 0.9 3 2052.5 0.8 4 2042.4 1.1

> data.matrix(temp2)

GDP CPIYOY [1,] 2098.1 NA [2,] 2085.4 2 [3,] 2052.5 1 [4,] 2042.4 3

**HTH
**
G

*>
**>
*

> > temp

*>
**> GDP CPIYOY
**>
**> 19540 2098.1 garbage
**>
**> 19632 2085.4 0.9
**>
**> 19724 2052.5 0.8
**>
**> 19814 2042.4 1.1
**>
**>
**> > data.matrix(temp)
**>
**> GDP CPIYOY
**>
**> 19540 4 4
**>
**> 19632 3 2
**>
**> 19724 2 1
**>
**> 19814 1 3
**>
**> >
**>
**>
**>
**> I'd like garbage to become NA but I tried filtering the array to scrub
**> the data but it has no effect. This illustrates it:
**>
**> > temp[1,2] <- NA
**>
**> > temp
**> GDP CPIYOY
**> 19540 2098.1 <NA>
**> 19632 2085.4 0.9
**> 19724 2052.5 0.8
**> 19814 2042.4 1.1
**>
**> > data.matrix(temp)
**> GDP CPIYOY
**> 19540 4 NA
**> 19632 3 2
**> 19724 2 1
**> 19814 1 3
**> >
**> -- Art Steinmetz
**>
**>
**>
**>
**>
**>
**>
**> ____________________________________________________________________________________
**> Be a better friend, newshound, and
**>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide
**> http://www.R-project.org/posting-guide.html
**> and provide commented, minimal, self-contained, reproducible code.
*

