[R] DocumentTermMatrix - text minig

From: Matevž Pavlič <matevz.pavlic_at_gi-zrmk.si>
Date: Fri, 20 May 2011 17:30:52 +0200


Hi All,  

I have a Data.frame that looks like that one below. I would like to do some text mining on it to possibly find some patterns between Opis, ACklasifikacija and Vodja. I looked over a tm package which loks promissing, more specifically DocumentTermMatrix or TermDocumentMatrix. But I can not figure out how to change my data from data.frame to Corpus or VCorpus.    

     Globina ACKlasifikacija                                                                                                   Opis GlobinaOd GlobinaDo   Vodja

3671       8              GP                           SLABO GRADUIRAN PEŠČEN PROD DO r = 70 mm, PREVLADUJE DO r = 30 mm, GOST, SIV      0.30      4.05 Beljsak

3675      12              GP            SLABO GRADUIRAN PEŠČEN PROD DO r = 80mm, PREVLADUJE DO r = 30mm, GOST, VLAŽEN DO MOKER, SIV      0.40      7.50 Kovacic

3684       8              GP                   SLABO GRADUIRAN PEŠČEN PROD DO r = 70 mm, PREVLADUJE DO r = 30 mm, SREDNJE GOST, SIV      4.00      6.15 Beljsak

3689      10              GP            SLABO GRADUIRAN PEŠČEN PROD DO r = 80mm, PREVLADUJE DO r = 30mm, GOST, VLAŽEN DO MOKER, SIV      0.20      5.20 Kovacic

3695      10              GP                         SLABO GRADUIRAN PEŠČEN PROD DO r = 70mm, PREVLADUJE DO 30mm, GOST, VLAŽEN, SIV      0.90      6.00 Kovacic

3699      10              GP               SLABO GRADUIRAN PEŠČEN PROD DO r = 90mm, PREVLADUJE DO r = 30mm, GOST, MOKER, SVETLORJAV      0.35      4.85 Kovacic

3706      10              GP                     SLABO GRADUIRAN PEŠČEN PROD DO r = 70mm, PREVLADUJE DO r = 30mM, GOST, VLAŽEN, SIV      0.50      4.10 Kovacic

3713      10              GP                     SLABO GRADUIRAN PEŠČEN PROD DO r = 80mm, PREVLADUJE DO r = 30mm, GOST, VLAŽEN, SIV      1.00      4.00 Kovacic

3739      32              GP                              SLABO GRADUIRAN, ZELO PEŠČEN PROD, MALO MELJAST, SREDNJE GOST, MOKER, SlV     15.40     16.00 Fasalek

3761      19              GP                             SLABO GRADUIRAN MELJAST TER PEŠČEN PROD, VLAŽEN DO MOKER, PROD DO r = 50MM      7.10     11.00 Fasalek

3801      10              GP SLABO GRADUIRAN PEŠČEN PROD DO r = 70 mm, PREVLADUJE DO r = 30 mm, Z VEČJIMI PRODNIKI, GOST, SIVO RJAV      0.60      4.50 Beljsak

 

Any help or ideas would be greatly appreciated,

m  

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 20 May 2011 - 15:33:06 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 20 May 2011 - 15:50:08 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive