[R] Text analysis question

About this list Date view Thread view Subject view Author view Attachment view

From: Andrew Perrin (clists@perrin.socsci.unc.edu)
Date: Thu 12 Jun 2003 - 08:10:53 EST

Message-id: <Pine.LNX.4.53.0306111810410.16755@perrin.socsci.unc.edu>

I'm grappling with a problem and would appreciate any thoughts on it.

I'm revising a paper for resubmission to a journal. For the paper, I've
coded each "turn" in a series of conversations with several binary codes.
(A turn is one package of statements made by one speaker, starting with
the beginning of the speech and ending when the speaker stops or is
interrupted.) The reviewers want me to justify the decision I made to code
each turn individually, ignoring (for this analysis) the turns that
surround each turn.

My thought is to run a logistic regression, predicting the
presence/absence of a code in a given turn, with independent variables
being the number of turns that have elapsed since each code was last used
in the conversation. No problem so far. The problem involves treating what
are essentially missing data. If I simply omit cases in which one or more
variables is missing, it's a very conservative test, since it includes
only turns for which all codes have already occurred once in the

An alternative is to set the number of turns that has elapsed since the
last use of code to a suitably high number--probably 1 + the total number
of turns elapsed in the conversation--which would let me include all
statements (including those that introduce codes into a conversation) but
also would inflate the influence of prior use on current use by
postulating a nonexistent use "just before" the conversation.

I hope this is clear enough to be informative. I'd be interested in any
thoughts folks might have.

Andy Perrin

Andrew J Perrin - http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North Carolina, Chapel Hill
clists@perrin.socsci.unc.edu * andrew_perrin (at) unc.edu

R-help@stat.math.ethz.ch mailing list

About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.3 : Tue 01 Jul 2003 - 09:11:53 EST