Re: [R] numeric coercion when one or more elements is non numerice

From: Gavin Simpson <gavin.simpson_at_ucl.ac.uk>
Date: Wed, 30 Jan 2008 16:22:57 +0000

hits=-2.6 tests=BAYES_00
X-USF-Spam-Flag: NO

On Wed, 2008-01-30 at 07:53 -0800, Arthur Steinmetz wrote:
> I don't understand this behavior. Why does the every data point get
> trashed by data.matrix when there is one non-numeric element in the
> array? Thanks.

I suspect it is because your data GDP variable is not what you think it is. What does str(temp) say about GDP? I'll guess it says something like this:

> str(temp)

'data.frame': 4 obs. of 2 variables:
 $ GDP : Factor w/ 4 levels "2042.4","2052.5",..: 4 3 2 1  $ CPIYOY: Factor w/ 4 levels "0.8","0.9","1.1",..: 4 2 1 3

which indicates that GDP is a factor.

As this shows, if GDP is numeric then data.matrix does produce what you want. If GDP is a factor however, you get the behaviour you observe.

> temp <- data.frame(GDP = c(2098.1, 2085.4, 2052.5, 2042.4), CPIYOY =
c("garbage", "0.9", "0.8", "1.1"))
> str(temp)

'data.frame': 4 obs. of 2 variables:
 $ GDP : num 2098 2085 2052 2042
 $ CPIYOY: Factor w/ 4 levels "0.8","0.9","1.1",..: 4 2 1 3
> temp

     GDP CPIYOY

1 2098.1 garbage
2 2085.4     0.9
3 2052.5     0.8
4 2042.4     1.1

> data.matrix(temp)
GDP CPIYOY [1,] 2098.1 4 [2,] 2085.4 2 [3,] 2052.5 1 [4,] 2042.4 3

> temp2 <- temp
> temp2$GDP <- as.factor(temp2$GDP)
> data.matrix(temp)
     GDP CPIYOY
[1,]   4      4
[2,]   3      2
[3,]   2      1
[4,]   1      3

One option could be to convert anything in $CPIYOY that is "garbage" to NA, and having made sure that temp$GDP is numeric and not a factor, then use data.matrix, which will now do what you want.

> temp$CPIYOY[temp$CPIYOY == "garbage"] <- NA
> temp

     GDP CPIYOY

1 2098.1   <NA>
2 2085.4    0.9
3 2052.5    0.8
4 2042.4    1.1

> data.matrix(temp)
GDP CPIYOY [1,] 2098.1 NA [2,] 2085.4 2 [3,] 2052.5 1 [4,] 2042.4 3

If temp$GDP is a factor, you can't just do as.numeric(temp$GDP) as this will result in the same behaviour as data.matrix. You need to convert to character then to numeric:

> temp2$GDP <- as.numeric(as.character(temp2$GDP))
> temp2

     GDP CPIYOY

1 2098.1   <NA>
2 2085.4    0.9
3 2052.5    0.8
4 2042.4    1.1

> data.matrix(temp2)
GDP CPIYOY [1,] 2098.1 NA [2,] 2085.4 2 [3,] 2052.5 1 [4,] 2042.4 3

HTH G

>
>
> > temp
>
> GDP CPIYOY
>
> 19540 2098.1 garbage
>
> 19632 2085.4 0.9
>
> 19724 2052.5 0.8
>
> 19814 2042.4 1.1
>
>
> > data.matrix(temp)
>
> GDP CPIYOY
>
> 19540 4 4
>
> 19632 3 2
>
> 19724 2 1
>
> 19814 1 3
>
> >
>
>
>
> I'd like garbage to become NA but I tried filtering the array to scrub
> the data but it has no effect. This illustrates it:
>
> > temp[1,2] <- NA
>
> > temp
> GDP CPIYOY
> 19540 2098.1 <NA>
> 19632 2085.4 0.9
> 19724 2052.5 0.8
> 19814 2042.4 1.1
>
> > data.matrix(temp)
> GDP CPIYOY
> 19540 4 NA
> 19632 3 2
> 19724 2 1
> 19814 1 3
> >
> -- Art Steinmetz
>
>
>
>
>
>
>
> ____________________________________________________________________________________
> Be a better friend, newshound, and
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 30 Jan 2008 - 16:35:02 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 30 Jan 2008 - 17:30:09 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive