Re: [Rd] R CMD build wiped my computer (from R-help)

From: Peter Dalgaard <pdalgd_at_gmail.com>
Date: Sun, 01 Aug 2010 23:49:40 +0200

Duncan Murdoch wrote:

> On 28/07/2010 8:10 PM, Ray Brownrigg wrote:

>> NOTE: Now submitted to R-devel, as this seems more appropriate.
>>
>> I may have spoken too soon about this having been fixed. (see below).
>>
>> If I create another "unusual but not 'invalid'" filename in the R subdirectory, the
>> behaviour is different from that reported below, and is similar to the original poster's
>> output (the third "unlink" command, where "xyz" was "~"):
>>
>> circa> ls -al RColorBrewer/R
>> total 140
>> -rwxr-xr-x 1 ray ecs 43988 Apr 17 2005 ColorBrewer.R~*
>> -rw-r--r-- 1 ray ecs 0 Jul 29 09:57 residuals.MCMCglmm.R?xyz
> 
> Ray clarified to me that this filename was "residuals.MCMCglmm.R" 
> preceded by 3 spaces and followed by a carriage return and "xyz".
> 

>> drwxr-xr-x 2 ray ecs 4096 Jul 29 12:02 ./
>> drwxr-xr-x 5 ray ecs 4096 Jul 29 11:49 ../
>> -rwxr-xr-x 1 ray ecs 43988 Jul 29 09:57 ColorBrewer.R*
>> -rwxr-xr-x 1 ray ecs 43988 Apr 17 2005 ColorBrewer.R~*
>> -rw-r--r-- 1 ray ecs 0 Jul 29 09:58 residuals.MCMCglmm.R
>> circa>
>> circa>
>> circa> R CMD build RColorBrewer
>> * checking for file 'RColorBrewer/DESCRIPTION' ... OK
>> * preparing 'RColorBrewer':
>> * checking DESCRIPTION meta-information ... OK
>> * checking whether 'INDEX' is up-to-date ... NO
>> * use '--force' to overwrite the existing 'INDEX'
>> * removing junk files
>> unlink RColorBrewer/R/ ColorBrewer.R~
>> unlink RColorBrewer/R/ColorBrewer.R
>> unlink RColorBrewer/R/ residuals.MCMCglmm.R
>> xyz
> 
> That certainly looks bad.  I can't reproduce it on Windows; it doesn't 
> allow that filename.  So I'll have to leave this for a Unix-alike user.


I have been following this from the sideline, because I suck really bad when it comes to Perl programming. However....

I'm seeing this stuff in the build script:

    ## Remove exclude files.
    open(EXCLUDE, "< $exclude");
    while(<EXCLUDE>) {

        rmtree(glob($_));
    }
    close(EXCLUDE);

Now this comes after

    find(\&find_exclude_files, "$pkgname");

which AFAICT prints a number of file names into EXCLUDE. Now if one of those file names contain a wildcard, I conjecture that the glob() can make weird things happen. I don't think we want to glob there, do we?

Another issue is that EXCLUDE seems unprotected against file names with embedded newlines. Something like find's -print0 would be handy...

> Duncan Murdoch
> 

>> unlink RColorBrewer/R/residuals.MCMCglmm.R
>> unlink RColorBrewer/R/ColorBrewer.R~
>> rmdir RColorBrewer/R
>> * checking for LF line-endings in source and make files
>> * checking for empty or unneeded directories
>> * building 'RColorBrewer_1.0-3.tar.gz'
>>
>> circa>
>>
>> Ray Brownrigg
>>
>> On Thu, 29 Jul 2010, Ray Brownrigg wrote:
>>> On Thu, 29 Jul 2010, Duncan Murdoch wrote:
>>>> On 28/07/2010 10:01 AM, Jarrod Hadfield wrote:
>>>>> Hi Marc,
>>>>>
>>>>> Thanks for the info on recovery - most of it can pieced together from
>>>>> backups but a quick, cheap and easy method of recovery would have been
>>>>> nicer.
>>>>>
>>>>> My main concern is that this could happen again and that the "bug" is
>>>>> not limited to R 2.9. I would think that an accidental carriage return
>>>>> at the end of a file name (even a temporary one) would be a reasonably
>>>>> common phenomenon (I'm surprised I hadn't done it before).
>>>> If you can put together a recipe to reproduce the problem (or a less
>>>> extreme version of R deleting files it shouldn't), we'll certainly fix
>>>> it. But so far all we've got are guesses about what might have gone
>>>> wrong, and I don't think anyone has been able to reproduce the problem
>>>> on current R.
>>> Duncan:
>>>
>>> It looks to me like it has already been fixed, if indeed that was the
>>> problem. In R-2.10.1, I tried to reproduce the problem (using
>>> RColorBrewer, since that was the smallest package I have a local copy of),
>>> and the build produced this:
>>>
>>> * removing junk files
>>> * excluding invalid files from 'RColorBrewer'
>>> Subdirectory 'R' contains invalid file names:
>>> residuals.MCMCglmm.R xyz
>>>
>>> where the space shown between the "R" and the "xyz" was a newline
>>> character. [I didn't dare try using a "~" :-)]
>>>
>>> Ray Brownrigg
>>>
>>>> Duncan Murdoch
>>>>
>>>>> Cheers,
>>>>>
>>>>> Jarrod
>>>>>
>>>>> On 28 Jul 2010, at 14:04, Marc Schwartz wrote:
>>>>>> Jarrod,
>>>>>>
>>>>>> Noting your exchange with Martin, Martin brings up a point that
>>>>>> certainly I missed, which is that somehow the tilde ('~') character
>>>>>> got into the chain of events. As Martin noted, on Linuxen/Unixen
>>>>>> (including OSX), the tilde, when used in the context of file name
>>>>>> globbing, refers to your home directory. Thus, a command such as:
>>>>>>
>>>>>> ls ~
>>>>>>
>>>>>> will list the files in your home directory. Similarly:
>>>>>>
>>>>>> rm ~
>>>>>>
>>>>>> will remove the files there as well. If the -rf argument is added,
>>>>>> then the deletion becomes recursive through that directory tree,
>>>>>> which appears to be the case here.
>>>>>>
>>>>>> I am unclear, as Martin appears to be, as to the steps that caused
>>>>>> this to happen. That may yet be related in some fashion to Duncan's
>>>>>> hypothesis.
>>>>>>
>>>>>> That being said, the use of the tilde character as a suffix to
>>>>>> denote that a file is a backup version, is not limited to Fedora or
>>>>>> Linux, for that matter. It is quite common for many text editors
>>>>>> (eg. Emacs) to use this. As a result, it is also common for many
>>>>>> applications to ignore files that have a tilde suffix.
>>>>>>
>>>>>> Based upon your follow up posts to the original thread, it would
>>>>>> seem that you do not have any backups. The default ext3 file system
>>>>>> that is used on modern Linuxen, by design, makes it a bit more
>>>>>> difficult to recover deleted files. This is due to the unlinking of
>>>>>> file metadata at the file system data structure level, as opposed to
>>>>>> simply marking the file as deleted in the directory structures, as
>>>>>> happens on Windows.
>>>>>>
>>>>>> There is a utility called ext3undel
>>>>>> (http://projects.izzysoft.de/trac/ext3undel ), which is a wrapper of
>>>>>> sorts to other undelete utilities such as PhotoRec and foremost. I
>>>>>> have not used it/them, so cannot speak from personal experience. Thus
>>>>>> it would be a good idea to engage in some reviews of the
>>>>>> documentation and perhaps other online resources before proceeding.
>>>>>> The other
>>>>>> consideration is the Catch-22 of not copying anything new to your
>>>>>> existing HD, for fear of overwriting the lost files with new data. So
>>>>>> you would need to consider an approach of downloading these utilities
>>>>>> via another computer and then running them on the computer in
>>>>>> question from other media, such as a CD/DVD or USB HD.
>>>>>>
>>>>>> A more expensive option would be to use a professional data recovery
>>>>>> service, where you would have to consider the cost of recovery
>>>>>> versus your lost time. One option would be Kroll OnTrack UK
>>>>>> (http://www.ontrackdatarecovery.co.uk/ ). I happen to live about a
>>>>>> quarter mile from their world HQ here in a suburb of Minneapolis. I
>>>>>> have not used them myself, but others that I know have, with good
>>>>>> success. Again, this comes at a
>>>>>> potentially substantial monetary cost.
>>>>>>
>>>>>> The key is that if you have any hope to recover the deleted files,
>>>>>> you not copy anything new onto the hard drive in the mean time.
>>>>>> Doing so will decrease the possibility of file recovery to near 0.
>>>>>>
>>>>>> As Duncan noted, there is great empathy with your situation. We have
>>>>>> all gone through this at one time or another. In my case, it was
>>>>>> perhaps 20+ years ago, but as a result, I am quite anal retentive
>>>>>> about having backups, which I have done for some time on my systems,
>>>>>> hourly.
>>>>>>
>>>>>> HTH,
>>>>>>
>>>>>> Marc Schwartz
>>>>>>
>>>>>> On Jul 28, 2010, at 5:55 AM, Jarrod Hadfield wrote:
>>>>>>> Hi Martin,
>>>>>>>
>>>>>>> I think this is the most likely reason given that the name in the
>>>>>>> DESCRIPTION file does NOT have a version number. Even so, it is
>>>>>>> very easy to misname a file and then delete it/change its name (as
>>>>>>> I've done here) and I hope current versions of R would not cause
>>>>>>> this problem. Perhaps Fedora should not use ~ as its back up file
>>>>>>> suffixes?
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Jarrod
>>>>>>>
>>>>>>> On 28 Jul 2010, at 11:41, Martin Maechler wrote:
>>>>>>>>>>>>> Jarrod Hadfield <j.hadfield_at_ed.ac.uk>
>>>>>>>>>>>>> on Tue, 27 Jul 2010 21:37:09 +0100 writes:
>>>>>>>>> Hi, I ran R (version 2.9.0) CMD build under root in
>>>>>>>>> Fedora (9). When it tried to remove "junk files" it
>>>>>>>>> removed EVERYTHING in my local account! (See below).
>>>>>>>>>
>>>>>>>>> Can anyone tell me what happened,
>>>>>>>> the culprit may lay here:
>>>>>>>>>> * removing junk files
>>>>>>>>>> unlink MCMCglmm_2.05/R/ residuals.MCMCglmm.R
>>>>>>>>>> ~
>>>>>>>> where it seems that someone (you?) have added a newline
>>>>>>>> in the filname, so instead of
>>>>>>>> 'residuals.MCMCglmm.R~'
>>>>>>>> you got
>>>>>>>>
>>>>>>>> 'residuals.MCMCglmm.R
>>>>>>>> ~'
>>>>>>>>
>>>>>>>> and the unlink / rm command interpreted '~' as your home
>>>>>>>> directory.
>>>>>>>>
>>>>>>>> But I can hardly believe it.
>>>>>>>> This seems explanation seems a bit doubtful to me.. ...
>>>>>>>>
>>>>>>>>> and even more importantly if I can I restore what was lost.
>>>>>>>> well, you just get it from the backup. You do daily backups, do
>>>>>>>> you?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Martin Maechler, ETH Zurich
>>>> ______________________________________________
>>>> R-help_at_r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html and provide commented,
>>>> minimal, self-contained, reproducible code.
>> ______________________________________________
>> R-devel_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd.mes_at_cbs.dk  Priv: PDalgd_at_gmail.com

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Sun 01 Aug 2010 - 21:52:20 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 03 Aug 2010 - 12:20:22 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive