Re: [R] regular expression for na.strings / read.table

From: Henrique Dallazuanna <wwwhsd_at_gmail.com>
Date: Tue, 12 Feb 2008 13:07:03 -0200

 as.data.frame(sapply(DATA, function(x){x[grep(patt="\\*", x)]<-NA;x}))

On 12/02/2008, jessica.gervais_at_tudor.lu <jessica.gervais_at_tudor.lu> wrote:
>
> Dear all,
>
> I am working with a csv file.
> Some data of the file are not valid and they are marked with a star '*'.
> For example : *789.
>
> I have attached with this email a example file (test.txt) that looks like
> the data I have to work with.
>
>
> I see 2 possibilities ..thast I cannot manage anyway in R:
>
> 1-first & easiest solution:
> Read the data with read.csv in R, and define as na strings all cells
> containing a star (*).
> Something which would looks like this ...
>
> >
> DATA<-read.csv("test.txt",na.strings=list(length(grep("\\*",DATA,value=T))==0))
>
> > DATA
> X1 X.789 LNM. X78 X56 X89 X56.1 X100
> 1 2 700 AUW 78 56 89 56 100
> 2 3 400 TOC 78 56 89 56 10
> 3 4 389 RMN 78 56 89 56 *89
> 4 5 400 LNM 78 56 *452 56 100
> 5 6 200 UTC 78 *40 89 56 100
> 6 7 100 GAT 78 56 8 56 *100
> 7 8 79 *LNM 78 56 9 56 100
> 8 9 89 TCG 78 56 800 56 *100
> 9 10 78* LNM 78 56 89 56 100
>
>
> ...but which would work (Stars are still there)! Do anyone knows how to do
> that ?
>
> 2-Second solution:
> - first read the file with DATA<-read.csv("test.txt")
> - then replace all fields containing a * with NA in applying the following
> function to the object DATA:
> DATA_cleaned<-apply(DATA,c(1,2),function(x){if(length(grep("\\*",x,value=TRUE))==1){x<-NA}})
> DATA_cleaned
> X1 X.789 LNM. X78 X56 X89 X56.1 X100
> [1,] NULL NULL NULL NULL NULL NULL NULL NULL
> [2,] NULL NULL NULL NULL NULL NULL NULL NULL
> [3,] NULL NULL NULL NULL NULL NULL NULL NA
> [4,] NULL NULL NULL NULL NULL NA NULL NULL
> [5,] NULL NULL NULL NULL NA NULL NULL NULL
> [6,] NULL NULL NULL NULL NULL NULL NULL NA
> [7,] NULL NULL NA NULL NULL NULL NULL NULL
> [8,] NULL NULL NULL NULL NULL NULL NULL NA
> [9,] NULL NA NULL NULL NULL NULL NULL NULL
>
> stars have deaseper, but all the rest too !
> The pb comes from the fact that if a field does not contain any *, the
> command
> if(length(grep("\\*",x,value=T))==1) return NULL instead of FALSE !
>
> I you have any idea, please let me know !
>
> Many thanks,
>
> Jessica
> ____________________________________
>
> Jessica Gervais
> Mail: jessica.gervais_at_tudor.lu
>
> Resource Centre for Environmental Technologies,
> Public Research Centre Henri Tudor,
> Technoport Schlassgoart,
> 66 rue de Luxembourg,
> P.O. BOX 144,
> L-4002 Esch-sur-Alzette, Luxembourg
>
> (See attached file: test.txt)
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>

-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue 12 Feb 2008 - 15:29:54 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 12 Feb 2008 - 16:30:13 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive