Re: [R] grep help needed

From: Denis Chabot <chabotd_at_globetrotter.net>
Date: Wed 27 Jul 2005 - 05:35:51 EST

Thanks for your help, the proposed solutions were much more elegant than what I was attempting. I adopted a slight modification of Tom Mulholland's solution with a piece from John Fox's solution, but many of you had very similar solutions.

require(maptools)
nc <- read.shape(system.file("shapes/sids.shp", package = "maptools") [1])
mappolys <- Map2poly(nc, as.character(nc$att.data$FIPSNO)) selected.shapes <- which(nc$att.data$SID74 > 20) # just to make it a smaller example
submap <- subset(mappolys, nc$att.data$SID74 > 20)

final.data <- NULL
for (j in 1:length(selected.shapes)){

     temp.verts <- matrix(as.vector(submap[[j]]),ncol = 2)
     n <- length(temp.verts[,1])
     temp.order <- 1:n
     temp.data <- cbind(rep(j,n),temp.order,temp.verts)
     final.data <- rbind(final.data,temp.data)
     }

colnames(final.data) <- c("PID", "POS", "X", "Y") final.data
my.data <- as.data.frame(final.data)
class(my.data) <- c("PolySet", "data.frame") attr(my.data, "projection") <- "LL"

meta <- nc[2]$att.data[selected.shapes,] PID <- seq(1,length(submap))
meta.data <- cbind(PID, meta)
class(meta.data) <- c("PolyData", "data.frame") attr(meta.data, "projection") <- "LL"

It would be nice if a variant of this was incorporated into PBSmapping to make it easier to import data from shapefiles!

Thanks again for your help,

Denis Chabot
Le 05-07-26 à 00:48, Mulholland, Tom a écrit :

>> -----Original Message-----
>> From: r-help-bounces@stat.math.ethz.ch
>> [mailto:r-help-bounces@stat.math.ethz.ch]On Behalf Of Denis Chabot
>> Sent: Tuesday, 26 July 2005 10:46 AM
>> To: R list
>> Subject: [R] grep help needed
>>
>>
>> Hi,
>>
>> In another thread ("PBSmapping and shapefiles") I asked for an easy
>> way to read "shapefiles" and transform them in data that PBSmapping
>> could use. One person is exploring some ways of doing this,
>> but it is
>> possible I'll have to do this "manually".
>>
>> With package "maptools" I am able to extract the information I need
>> from a shapefile but it is formatted like this:
>>
>> [[1]]
>> [,1] [,2]
>> [1,] -55.99805 51.68817
>> [2,] -56.00222 51.68911
>> [3,] -56.01694 51.68911
>> [4,] -56.03781 51.68606
>> [5,] -56.04639 51.68759
>> [6,] -56.04637 51.69445
>> [7,] -56.03777 51.70207
>> [8,] -56.02301 51.70892
>> [9,] -56.01317 51.71578
>> [10,] -56.00330 51.73481
>> [11,] -55.99805 51.73840
>> attr(,"pstart")
>> attr(,"pstart")$from
>> [1] 1
>>
>> attr(,"pstart")$to
>> [1] 11
>>
>> attr(,"nParts")
>> [1] 1
>> attr(,"shpID")
>> [1] NA
>>
>> [[2]]
>> [,1] [,2]
>> [1,] -57.76294 50.88770
>> [2,] -57.76292 50.88693
>> [3,] -57.76033 50.88163
>> [4,] -57.75668 50.88091
>> [5,] -57.75551 50.88169
>> [6,] -57.75562 50.88550
>> [7,] -57.75932 50.88775
>> [8,] -57.76294 50.88770
>> attr(,"pstart")
>> attr(,"pstart")$from
>> [1] 1
>>
>> attr(,"pstart")$to
>> [1] 8
>>
>> attr(,"nParts")
>> [1] 1
>> attr(,"shpID")
>> [1] NA
>>
>> I do not quite understand the structure of this data object (list of
>> lists I think)
>> but at this point I resorted to printing it on the console and
>> imported that text into Excel for further cleaning, which is easy
>> enough. I'd like to complete the process within R to save
>> time and to
>> circumvent Excel's limit of around 64000 lines. But I have a hard
>> time figuring out how to clean up this text in R.
>>
>> What I need to produce for PBSmapping is a file where each block of
>> coordinates shares one ID number, called PID, and a variable POS
>> indicates the position of each coordinate within a "shape".
>> All other
>> lines must disappear. So the above would become:
>>
>> PID POS X Y
>> 1 1 -55.99805 51.68817
>> 1 2 -56.00222 51.68911
>> 1 3 -56.01694 51.68911
>> 1 4 -56.03781 51.68606
>> 1 5 -56.04639 51.68759
>> 1 6 -56.04637 51.69445
>> 1 7 -56.03777 51.70207
>> 1 8 -56.02301 51.70892
>> 1 9 -56.01317 51.71578
>> 1 10 -56.00330 51.73481
>> 1 11 -55.99805 51.73840
>> 2 1 -57.76294 50.88770
>> 2 2 -57.76292 50.88693
>> 2 3 -57.76033 50.88163
>> 2 4 -57.75668 50.88091
>> 2 5 -57.75551 50.88169
>> 2 6 -57.75562 50.88550
>> 2 7 -57.75932 50.88775
>> 2 8 -57.76294 50.88770
>>
>> First I imported this text file into R:
>> test <- read.csv2("test file.txt",header=F, sep=";", colClasses =
>> "character")
>>
>> I used sep=";" to insure there would be only one variable in this
>> file, as it contains no ";"
>>
>> To remove lines that do not contain coordinates, I used the
>> fact that
>> longitudes are expressed as negative numbers, so with my very
>> limited
>> knowledge of grep searches, I thought of this, which is probably not
>> the best way to go:
>>
>> a <- rep("-", length(test$V1))
>> b <- grep(a, test$V1)
>>
>> this gives me a warning ("Warning message:
>> the condition has length > 1 and only the first element will be used
>> in: if (is.na(pattern)) {"
>> but seems to do what I need anyway
>>
>> c <- seq(1, length(test$V1))
>> d <- c %in% b
>>
>> e <- test$V1[d]
>>
>> Partial victory, now I only have lines that look like
>> [1,] -57.76294 50.88770
>>
>> But I don't know how to go further: the number in square
>> brackets can
>> be used for variable POS, after removing the square brackets and the
>> comma, but this requires a better knowledge of grep than I have.
>> Furthermore, I don't know how to add a PID (polygon ID) variable,
>> i.e. all lines of a polygon must have the same ID, as in the example
>> above (i.e. each time POS == 1, a new polygon starts and PID
>> needs to
>> be incremented by 1, and PID is kept constant for lines where
>> POS ! 1).
>>
>> Any help will be much appreciated.
>>
>> Sincerely,
>>
>> Denis Chabot
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Jul 27 05:42:27 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:34:01 EST