Re: [R] how to import such data to R?

From: Marc Schwartz <MSchwartz_at_mn.rr.com>
Date: Sun 16 Oct 2005 - 02:43:37 EST

On Sat, 2005-10-15 at 23:54 +0800, ronggui wrote:
> It seems my last post not sent successfully ,so I post again.
>
> -------------
> the data file has such structure:
>
> 1992 6245 49 . . 20 1
> 0 0 8.739536 0 . . .
> . . . . . "alabama"
> . 0 .
> 1993 7677 58 . . 15 1
> 0 0 8.945984 1 . 0 .2064476
> -5 0 . 0 8.739536 "alabama"
> 9 0 0
> 1992 13327 57 36 58 16 0
> 0 0 9.497547 0 47 . .
> . . . 0 . "arizona"
> . 0 .
> 1993 19860 57 36 58 16 1
> 1 0 9.896463 1 47 0 .3989162
> 0 1 0 1 9.497547 "arizona"
> 0 1 1
> 1992 10422 37 28 58 20 0
> 0 0 9.251675 0 43 . .
> . . . -1 . "arizona state"
> . 0 .
>
> ------snip-----
>
> the data descriptions is:
>
> variable names:
>
> year apps top25 ver500 mth500 stufac bowl btitle
> finfour lapps d93 avg500 cfinfour clapps cstufac cbowl
> cavg500 cbtitle lapps_1 school ctop25 bball cbball
>
> Obs: 118
>
> 1. year 1992 or 1993
> 2. apps # applics for admission
> 3. top25 perc frosh class in 25th high sch percen
> 4. ver500 perc frosh >= 500 on verbal SAT
> 5. mth500 perc frosh >= 500 on math SAT
> 6. stufac student-faculty ratio
> 7. bowl = 1 if bowl game in prev year
> 8. btitle = 1 if men's cnf chmps prev year
> 9. finfour = 1 if men's final 4 prev year
> 10. lapps log(apps)
> 11. d93 =1 if year = 1993
> 12. avg500 (ver500+mth500)/2
> 13. cfinfour change in finfour
> 14. clapps change in lapps
> 15. cstufac change in stufac
> 16. cbowl change in bowl
> 17. cavg500 change in avg500
> 18. cbtitle change in btitle
> 19. lapps_1 lapps lagged
> 20. school university name
> 21. ctop25 change in top25
> 22. bball =1 if btitle or finfour
> 23. cbball change in bball
>
>
> so the each four lines represent one case,can some variables are numeric and some are character.
> I though the scan can read it in ,but it seems somewhat tricky as the mixed type of variables.any suggestions?

There may be an easier way, but here is one possible approach:

First, use scan to read in the data. Set the 'what' argument to a list of atomic data types, based upon your specs above. Also, set the 'na.names' argument to '.'.

This will read in the multiple lines for each record, into a single record based upon there being 23 elements per record. That is based upon 'length(what)'. Note also the 'multi.line' argument in scan().

data <- scan("data.txt",

             what = c(rep(list(numeric(0)), 19), 
                      list(character(0)), 
                      rep(list(numeric(0)), 3)), 
             na.strings = ".")


'data' is now a list of values, where each list element is a proper column from your original data file. Now use as.data.frame(), which will take each list element and turn it into a column in a data frame. preserving the data types.

data <- as.data.frame(data)

Now, read in the column names for the data frame from a text file, containing your field names above, and set the data frame column names to these.

Names <- scan("names.txt", what = character(0)) names(data) <- Names

Now review the structure of 'data':

> data

  year apps top25 ver500 mth500 stufac bowl btitle finfour lapps

1 1992  6245    49     NA     NA     20    1      0       0 8.739536
2 1993  7677    58     NA     NA     15    1      0       0 8.945984
3 1992 13327    57     36     58     16    0      0       0 9.497547
4 1993 19860    57     36     58     16    1      1       0 9.896463
5 1992 10422    37     28     58     20    0      0       0 9.251675
  d93 avg500 cfinfour    clapps cstufac cbowl cavg500 cbtitle  lapps_1
1   0     NA       NA        NA      NA    NA      NA      NA       NA
2   1     NA        0 0.2064476      -5     0      NA       0 8.739536
3   0     47       NA        NA      NA    NA      NA       0       NA
4   1     47        0 0.3989162       0     1       0       1 9.497547
5   0     43       NA        NA      NA    NA      NA      -1       NA
         school ctop25 bball cbball
1       alabama     NA     0     NA
2       alabama      9     0      0
3       arizona     NA     0     NA
4       arizona      0     1      1
5 arizona state     NA     0     NA


HTH, Marc Schwartz



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sun Oct 16 02:48:28 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:40:45 EST