Re: [R] Reading CSV file with unequal record length

From: Marc Schwartz <marc_schwartz_at_comcast.net>
Date: Wed, 02 Jul 2008 16:07:47 -0500

on 07/02/2008 01:55 PM Viswanathan Shankar wrote:
> Hello ,
> I am having some difficulty reading a CSV file of unequal record length
> in R . The data has 26 columns and do not have header and is generated
> from a R syntax -
> write.table(schat,"schat.csv", sep=",", col.names=FALSE, append = TRUE)
>
> 1.0,1.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.5,1.9,2.7,,,,
>
> 1.0,2.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.5,0.6,0.7,0.8,0.9,1.1,1.2,1.4,1.6,1.9,2.2,2.7,,,
>
> 1.0,3.0,0.0,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,1.0,1.2,1.4,1.7,2.1,3.1,5.0,,,,,
>
> 1.0,4.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.6,0.7,0.7,0.9,1.0,1.2,1.4,1.7,2.2,3.0,,,,,
>
> 1.0,5.0,0.0,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.4,1.6,1.9,2.4,3.3,,,,
>
> 1.0,6.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.1,1.3,1.7,2.1,3.4,,,,,
>
> 1.0,7.0,0.0,0.1,0.1,0.2,0.3,0.3,0.4,0.5,0.5,0.6,0.7,0.8,0.9,1.1,1.2,1.4,1.7,2.0,2.5,3.3,5.5,,,
>
> 1.0,8.0,0.0,0.1,0.1,0.2,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.6,0.7,0.8,0.9,1.0,1.2,1.3,1.5,1.7,2.0,2.3,2.8,4.2
>
> 1.0,9.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.4,1.6,1.9,2.2,2.9,4.2,,
>
> 1.0,10.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.6,0.8,1.0,1.3,1.6,2.4,3.6,6.0,,,,,,,
>
>
> when I use the following syntax to read the above written data
>
> schat_n<-data.frame(read.table("schat.csv", sep=",", header = FALSE,
> fill=TRUE))
>
> the data is fine until record # 7 but it gets wrapped on id 8 & 9 and
> limits the column to 23 and remaining values are made into second record
> as shown below with 12 records instead 10
>
> 1.0,1.0,1.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.5,1.9,2.7,NA
>
> 2.0,1.0,2.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.5,0.6,0.7,0.8,0.9,1.1,1.2,1.4,1.6,1.9,2.2,2.7
>
> 3.0,1.0,3.0,0.0,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,1.0,1.2,1.4,1.7,2.1,3.1,5.0,NA,NA
>
> 4.0,1.0,4.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.6,0.7,0.7,0.9,1.0,1.2,1.4,1.7,2.2,3.0,NA,NA
>
> 5.0,1.0,5.0,0.0,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.4,1.6,1.9,2.4,3.3,NA
>
> 6.0,1.0,6.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.1,1.3,1.7,2.1,3.4,NA,NA
>
> 7.0,1.0,7.0,0.0,0.1,0.1,0.2,0.3,0.3,0.4,0.5,0.5,0.6,0.7,0.8,0.9,1.1,1.2,1.4,1.7,2.0,2.5,3.3,5.5
>
> 8.0,1.0,8.0,0.0,0.1,0.1,0.2,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.6,0.7,0.8,0.9,1.0,1.2,1.3,1.5,1.7,2.0
>
> 9.0,2.3,2.8,4.2,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
> 10.0,1.0,9.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.4,1.6,1.9,2.2,2.9
>
> 11.0,4.2,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
> 12.0,1.0,10.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.6,0.8,1.0,1.3,1.6,2.4,3.6,6.0,NA,NA,NA,NA
>
>
> I would like the dataset to be read as is with 10 records and 26
> columns, any inputs to get this fixed is greatly appreciable.
>
> Thank you in advance.
>
> Shankar

At least based upon the data that you posted above, I have no problem reading it:

DF <- read.table("clipboard", sep = ",")

 > DF

    V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18

1   1  1  0 0.1 0.1 0.1 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.6 0.7 0.8 0.9 1.0
2   1  2  0 0.1 0.1 0.2 0.2 0.3 0.3 0.4 0.5 0.5 0.6 0.7 0.8 0.9 1.1 1.2
3   1  3  0 0.1 0.1 0.2 0.2 0.3 0.4 0.4 0.5 0.6 0.7 0.8 1.0 1.2 1.4 1.7
4   1  4  0 0.1 0.1 0.2 0.2 0.3 0.3 0.4 0.5 0.6 0.7 0.7 0.9 1.0 1.2 1.4
5   1  5  0 0.1 0.1 0.2 0.2 0.3 0.4 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.2 1.4
6   1  6  0 0.1 0.1 0.1 0.2 0.2 0.3 0.4 0.4 0.5 0.6 0.7 0.8 0.9 1.1 1.3
7   1  7  0 0.1 0.1 0.2 0.3 0.3 0.4 0.5 0.5 0.6 0.7 0.8 0.9 1.1 1.2 1.4
8   1  8  0 0.1 0.1 0.2 0.2 0.2 0.3 0.4 0.4 0.5 0.6 0.6 0.7 0.8 0.9 1.0
9   1  9  0 0.1 0.1 0.1 0.2 0.2 0.3 0.4 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.2
10  1 10  0 0.1 0.1 0.2 0.2 0.3 0.3 0.4 0.5 0.6 0.8 1.0 1.3 1.6 2.4 3.6

    V19 V20 V21 V22 V23 V24 V25 V26

1  1.2 1.5 1.9 2.7  NA  NA  NA  NA
2  1.4 1.6 1.9 2.2 2.7  NA  NA  NA
3  2.1 3.1 5.0  NA  NA  NA  NA  NA
4  1.7 2.2 3.0  NA  NA  NA  NA  NA
5  1.6 1.9 2.4 3.3  NA  NA  NA  NA
6  1.7 2.1 3.4  NA  NA  NA  NA  NA
7  1.7 2.0 2.5 3.3 5.5  NA  NA  NA
8  1.2 1.3 1.5 1.7 2.0 2.3 2.8 4.2
9  1.4 1.6 1.9 2.2 2.9 4.2  NA  NA
10 6.0  NA  NA  NA  NA  NA  NA  NA


That you are using 'append = TRUE' in the write.table() call for your actual data, suggests that you might have an actual source data file with output from more than one object with differing structures, resulting in mixed input formats and that may be a problem.

If the CSV file should only contain data from one R object, don't use 'append = TRUE' or, be absolutely sure that the multiple objects have identical structures.

HTH, Marc Schwartz



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 02 Jul 2008 - 21:38:47 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 02 Jul 2008 - 22:31:22 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive