From: Suraaga Kulkarni <suraaga.kulkarni_at_gmail.com>
Date: Sun, 30 Mar 2008 16:41:04 -0400


I need to resample characters from a dataset that consists of an extremely long string that is written over hundreds of thousands of lines, each of length 50 characters. I am currently doing this by first inserting a space after each character in the dataset and then using the following commands:

y <- as.matrix(read.table("data.txt"), stringsAsFactors=FALSE) bstrap <- sample(length(y), 100000, TRUE) write(y[bstrap], file="Rep1.txt", ncolumns=50, append=FALSE) bstrap <- sample(length(y), 100000, TRUE) write(y[bstrap], file="Rep2.txt", ncolumns=50, append=FALSE) bstrap <- sample(length(y), 100000, TRUE)

and so on for 500 reps.

I think there should be a better way of doing this. My specific questions:

  1. Is there a way to avoid inserting spaces between the characters before calling the "sample" command (because I don't want spaces between the resampled characters in the output either; see number 2 below)?
  2. If I have no choice but to insert the spaces in my data before resampling, is there a way to output the resampled data without spaces, but simply as 50-character long strings one below the other)? I tried inserting the following command: strip.white=TRUE in the write command line, but it gave me an error as it did not understand the command.
  3. Finally, since I have to get 500 such resampled reps from each dataset (and there are over 20 such huge datasets) is there a way around having to write a separate write command for each rep?

Any suggestions will be greatly appreciated.



