[R] constructing a dataframe from a database of newspaper articles

From: Bob Green <bgreen_at_dyson.brisnet.org.au>
Date: Sun 23 Jul 2006 - 16:39:00 EST

I am hoping for some assistance with formatting a large text file which consists of a series of individual records. Each record includes specific labels/field names (a sample of 1 record (one of the longest ones) is below - at end of post. What I want to do is reformat the data, so that each individual record becomes a row (some cells will have a lot of text). For example, the column variables I want are (a) HD in one column (b) BY in one column (c) WC data in one column, (d) PD data in one column, (e) SC data in one column (f) PG data in one column & g) LP and TD text in one column - this column can contain quite a lot of text, e.g 1900 words. The other fields are unwanted

If there were 150 individual records, when formatted this would be a 7 column by 150 row dataset.

I was advised to:

  1. read in the file using readLines giving a character vector one element per input line.
  2. convert that to lines of the form: id op text where each such line is a field and multiline fields have been collapsed into a single line of text. This step involves detailed processing and you could do it in a loop or you could try a vectorized approach. A vectorized approach will likely involve using
  3. the lines created above could be converted to a data frame with three columns and
  4. reshape used to create a "wide" data frame.
  5. then write it out using write.csv.

I have got as far as being able to read the text into R - I am unsure if the warning is a problem. I am however, not at all sure what I need to do next.

Any assistance is much appreciated,


(A) syntax

  mht <- scan(what="c:\\cm-mht1.txt").
readLines("c:\\cm-mht1.txt",n = -1)

[8376] "© 2006 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All "
[8377] "rights reserved.
Warning message:
incomplete final line found by readLines on 'c:\cm-mht1.txt'

(B) sample data

       HD Was Charles Manson temporarily insane when he led a wild killing
       rampage in the US in 1969?
       BY By Deborah Cassrels.
       WC 1834 words
       PD 23 June 2001
       SN Courier Mail
       SC COUMAI
       PG 30
       LA English
       CY (c) 2001 Queensland Newspapers Pty Ltd

       LP Was Charles Manson temporarily insane when he led a wild killing
       rampage in the US in 1969? Clearly he was mad and bad. But would
       Queensland have placed him before its Mental Health Tribunal, found 
him of
       unsound mind at the time of his crimes, institutionalised him and

"treated" his illness? WHY is Queensland the only jurisdiction in the
Commonwealth with a Mental Health Tribunal which establishes if an accused is fit to face trial or of unsound mind at the time of an alleged offence? Why is mental incompetence not determined in an adversarial court by a jury? Under the Mental Health Act 1974, the tribunal, a statutory body operating since 1985, comprises three-yearly appointments of a Supreme Court judge and two assisting psychiatrists, whose advice does not have to be accepted. The judge alone constitutes the tribunal, an inquisitorial process conducted in the Supreme Court in Brisbane. TD Victims or family are not notified of hearings or allowed to submit victim impact statements. They are prohibited from talking to the media until 28 days after the decision. And when patients return to the community there is no requirement for neighbours or victims to be notified. Is this legislation enlightened or are we just suckers, falling for time and money-saving strategies? The tribunal has earned a reputation as progressive, humane and economical among some judges who have presided over it. The inaugural chair, former Supreme Court judge Angelo Vasta QC, thinks the tribunal system is "enlightened" and "it saves an enormous amount of expenditure". He points to the humane side of treating the ill in a secure hospital rather than punishing them for offences but is uncomfortable with borderline cases. "Whether people are mad or bad ought to be established by a very thorough investigation. The associated Patient Review Tribunals (of which there are five) consist of three to six members, including the chair who is a legal officer, a medical practitioner and a mental health professional. A psychiatrist is not required. The other three have no specific qualifications and can include former patients. The tribunals operate in closed hearings and patients of unsound mind or unfit for trial are reviewed every 12 months. Leave is granted either by the Mental Health Tribunal or the Patient Review Tribunal, which determine when a restricted patient is discharged into the community. Says the Director of Mental Health, Dr Peggy Brown:
"In the case of serious offences you can be assured the period of
monitoring is quite lengthy." Under the Mental Health Act 2000 to be implemented late this year, the tribunal will be replaced by a Mental Health Court and the Patient Review Tribunal by the Mental Health Review Tribunal. Queensland Health Minister Wendy Edmond says the name change reflects transparency, with proceedings under oath and cross-examination of witnesses. The legislation represents "real change to the rights of victims of crime". But there is still an embargo on publishing decisions in the media. Dr Brown says when patients are granted leave, victims or families can apply to be notified but decisions will be made on individual cases. "The (new) tribunal has to establish that there are reasonable grounds for the notification order to be made ... and it's also an appealable decision," returning to the Mental Health Court. Brown says there are efficiencies in the new legislation but "it's not about saving money". The main advantages were that victims could make submissions to both bodies. Concerns still might not be addressed but reasons were expected to be provided. The court's composition and sole power of the judge will be retained. Victims or relatives can be notified of hearings and decisions about the patient. If not, reasons must be provided. The Patient Review Tribunals will be replaced by one tribunal with hearings still closed. It will comprise up to five members including a president (a lawyer of at least seven years' standing), psychiatrist or medical practitioner and community members and it will be chaired by a legal officer. Leave will be approved by the corresponding previous bodies. Chief Justice Paul de Jersey who presided over the 1995 case of Ross Farrah, a paranoid schizophrenic, who after murdering his girlfriend, Christine Nash, was allowed out of the John Oxley Centre to play sport and see movies, says the proposed legislative changes to the Mental Health Act appear to be "refinements". Two weeks ago, Nash's teenage son Wade committed suicide after suffering years of torment following his mother's murder. In May 1996, a letter was sent to the tribunal by now former director of secure care services at John Oxley Dr Peter Fama. It said:
"Should Ross be committed to the Tribunal for trial on a charge of
manslaughter or murder, I have to report that he is now fit to be placed in corrective custody ... There is no clinical need for further detention of Ross in hospital." De Jersey has been involved in the process of amendments in the new Act and believes the "adjustments" are satisfactory:
"It's probably a question of how they're implemented. I thought the
changes were more concerned with image than effecting substantial change to the system, calling it a court rather than a tribunal. There is some attempt to enhance the openness of the procedures such as the advice given by the existing psychiatrists being revealed in open court to the judge but they're aspects of streamlining rather than substantive change." He says many people are irked by a perceived disproportion between the treatment of mentally ill offenders and their victims. "As a community we need much more positively to address the situation of victims." De Jersey points to the James Bulger murder in the UK eight years ago when two 10-year-old boys abducted and battered James, two, to death. The killers are expected to be freed soon. Says de Jersey: "Whatever one thinks of future plans for the young offenders it is extraordinary, if reportedly correct, that so little help has been given to the bereft mother of the murdered toddler. "Similarly, here, it is generally indefensible where victims or the families of victims are not informed of details of the likely release of their offenders, and even before that where they are not given a proper explanation as to the process and counselling to help them comprehend that process and as well the consequences of the crime. We are as a community moving towards a greater focus on the position of victims but a lot more needs to be done. "The anguish of victims and the families of victims that insane offenders appear to escape punishment is understandable. The issue is whether the community is prepared to accept that insane offenders primarily need treatment." The Mental Health Tribunal worked on two assumptions, that offenders of unsound mind should, in the interests of the community, be treated rather than punished, and that a determination whether an offender was of unsound mind could responsibly be made by a Supreme Court judge with expert psychiatric assistance. "I have wondered whether with the ultimately serious crimes such as murder the community may not reasonably demand that in the interests of reassurance that the determination be made by a jury." He believes the community's longer term interests would best be served by medically treating insane offenders in a hospital rather than a prison, where if rehabilitated, they could contribute to the community. "I accept, however, that in many cases there will be serious residual concern, for example, can the offender be trusted, if left unsupervised, to continue to take the relevant medication?" De Jersey admits problems have arisen when offenders, granted leave, stopped taking medication but says if they can be relied upon to maintain stability through medication it would be inhumane to keep them locked up. Continued medical monitoring was necessary. If conditions were breached the person should be returned to restricted custody at the psychiatric hospital. While the most vulnerable in society deserve compassion it does not surprise there is public concern about lack of proper scrutiny, the capacity to re-offend and misuse of the legal process by using insanity as a defence. IN the general quest to improve treatment provisions for patients the 2000 Act says: "The new legislation provides for involuntary treatment in the community as an alternative to being an in-patient in a mental health service which reflects contemporary clinical practice and the principle of reform that involuntary treatment must be in the least restrictive form." Perhaps the overwhelming feeling is patients' rights have priority over victims' rights. Ted Flack, spokesman for the Queensland Homicide Victims Support Group says the new Act provides a better environment for victims' participation, but there are serious flaws. The rights of homicide victims were not guaranteed and this caused an inordinate amount of distress.
"There's still considerable discretion in the hands of the Mental
Health Court and the Mental Health Review Tribunal as to whether they would admit any evidence from the victims. The new Act is framed in such a way as to provide guaranteed rights to the person who's suffering from a mental illness and those rights come appropriately from the international conventions, but there are similar international conventions for victims and they are being completely ignored in the Act." Flack says the primary purpose of the Mental Health Tribunal is to save money and to safeguard the rights of the mentally disabled person. He believes the criminally insane can be catered for properly in jail. "The imprecise science of psychiatry is not an appropriate set of guidelines for the release into the community of dangerous killers," he says. NS GCAT : Political/General News | GCRIM : Crime/Courts | GHEA : Health | GHOME : Law Enforcement RE AUSNZ : Australia and New Zealand | AUSTR : Australia AN Document coumai0020010710dx6n005vl ______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun Jul 23 16:55:18 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 24 Jul 2006 - 06:17:41 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.