Re: [Rd] tar R command

From: Henrik Bengtsson <hb_at_biostat.ucsf.edu>
Date: Sun, 28 Nov 2010 20:35:47 -0800

First, if you look carefully, then you see that argument 'files' should specify *filepaths*, i.e. directories and not specific files. Thus, if you for instance place your files in directory "foo/" and then call

tar("foo.tar", files="foo/");

you would do the right thing.

HOWEVER, looking at the internals of base::tar(), it seems to be designed for a non-Windows platform, i.e. it will not work on Windows as it stands (more below). A workaround that also illustrating the problems are the following patch(es):

# PATCH for file.info() such that tar() works on Windows tar <- utils::tar; environment(tar) <- globalenv(); file.info <- function(...) {
  fi <- base::file.info(...);
  fi[setdiff(c("uid", "gid", "uname", "grname"), names(fi))] <- NA;   fi;
} # file.info()

Example:

dir.create("foo/");

cat(file="foo/foo.txt", rep(letters, times=100));
tar("foo.tar", files="foo/");
str(file.info("foo.tar"));

'data.frame': 1 obs. of 11 variables:

 $ size  : num 7680
 $ isdir : logi FALSE
 $ mode  :Class 'octmode'  int 438
 $ mtime : POSIXct, format: "2010-11-28 20:24:05"
 $ ctime : POSIXct, format: "2010-11-28 20:03:56"
 $ atime : POSIXct, format: "2010-11-28 20:07:40"
 $ exe   : chr "no"
 $ uid   : logi NA
 $ gid   : logi NA
 $ uname : logi NA
 $ grname: logi NA

This seems to generate a valid foo.tar file.

PROBLEMS:
Here are a few problems I have identified with tar().

PROBLEM #1:
The default for argument files=NULL is documented "to archive all files under the current directory". In reality it gives:

  Error in list.files(files, recursive = TRUE, all.files = TRUE, full.names = TRUE: invalid 'directory' argument

because list.files(NULL) is invalid. The default should instead be files=".".

PROBLEM #2:
If passing a non-existing path (argument 'files'), then tar() generates an invalid *.tar file of size 1024 bytes (not empty as OP say). Better would be to assert that each of the directories requested really exists and are directories, e.g. using file.info()$dir.

PROBLEM #3:
tar() assumes that file.info() returns a data.frame with fields 'uid', 'gid' and 'uname'. That is not the case for file.info() on Windows.

> sessionInfo()

R version 2.12.0 Patched (2010-11-24 r53656) Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:

[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

My $0.20

/Henrik

On Sun, Nov 28, 2010 at 7:00 PM, Dario Strbenac <D.Strbenac_at_garvan.org.au> wrote:
> Hello,

>

> The documentation for the tar command leads me to think there is an internal implementation when the command can't be found in the OS.
>

> However, it doesn't seem to be the case, as I get an empty .tar file generated on a small example I made :
>
>> dir(pattern = "jpg")

> [1] "MA56237502_635.jpg"
>> file.info("MA56237502_635.jpg")

>                     size isdir mode               mtime               ctime               atime exe
> MA56237502_635.jpg 229831 FALSE  666 2010-11-29 13:05:49 2010-11-29 13:00:36 2010-11-29 13:00:36  no
>> tar("example.tar", files = dir(pattern = "jpg"))
>> file.info("example.tar")

>            size isdir mode               mtime               ctime               atime exe
> example.tar 1024 FALSE  666 2010-11-29 13:43:29 2010-11-29 13:42:30 2010-11-29 13:42:30  no
>

> Is this an unimplemented feature ?
>
>> sessionInfo()

> R version 2.12.0 (2010-10-15)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
> ...                ...               ...
>

> Thanks,
>       Dario.
>

> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 29 Nov 2010 - 04:38:09 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 29 Nov 2010 - 13:20:29 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive