Re: [R] using spec.pgram

From: Anthony Mathelier <anthony.mathelier_at_gmail.com>
Date: Mon, 16 Jun 2008 18:14:40 +0200

OK, it seems like I do not succeed in expressing what I do, or want to do. So, I give you the example that bring me to this kind of analysis. I wrote the paper "Chromosomal periodicity of evolutionary conserved gene pairs" (which you can download at http://www.pnas.org/cgi/reprint/104/25/10559). In figure 2, they have a histogram of distances between genes on a chromosome and they make a discrete fourier transform analysis to exhibit a period of 117kb. They explain how they did in the first paragraph of "Distributions of distances and positions and fourier transform" (last page). I thought that this kind of analysis was made by spec.pgram with a histogram. But perhaps I am wrong because I really do not understand what they mean by "the histogram was tranformed into a continuous probability density by using a Gaussian smoothing window and normalizing the total density over the entire genome to 1. A discrete Fourier transform of the data were computed from 0 to 1,000kb by using a Tukey window to taper the end (ratio of 0.5 for tapered to untapered length.".
I hope it explains better what I want to obtain from my distances. Best regards,

Anthony

On Mon, Jun 16, 2008 at 5:25 PM, stephen sefick <ssefick_at_gmail.com> wrote:

> To get some sort of frequency which in your case seem to be cycles per
> distance? Is a valid use of a fourier transform as long as it is a distance
> that is measured in a way that would be analogous to a time series- In
> other words if the distance proceeds from an origin in one direction-
> geophysicists do this often with the realization of an earthquake picked up
> by sensors that are a distance away from the origin of the epicenter, but
> they are looking for coherencies in the signal from one place to the next in
> the frequency domain seperated by distance- this is called beam forming-
> They use the raw signal- by binning (making a histogram) the data you are
> loosing the signal- you are looking at frequency of occurance of certain
> values not for the underlying periodicities of the data (in time or
> space). You are fitting cos and isin function to you data to see if there
> is periodicity- the power is the integration of the convolution of this sin
> and cosine function with your data- It seems to me meaningless to preform
> this convolution agianst something that is not a signal (the histogram). If
> you want to use a frequency domain technique you have to have a frequency to
> investigate- a histogram does not have this- I is a frequency of occurance
> by bin size which is NOT what you want (your would have cycles/binlength
> that doesn't make any sense to me) to do this analysis on- You want a
> signal- dissolved oxygen curve, sunspot record, etc. through time, or
> distance as stated above- you are looking for the frequency of a waveform-
> Anyway, I may be misunderstanding- supply some code and explain the data
> otherwise this line of though- in my limited expertise- is a dead end, but
> agian I still don't know what it is that you are, exactly, trying to do- and
> what your dataset constits. I hope these ruminations help
>
> I recommend doing this analysis on the raw data- It doesn't matter that you
> don't have the same amount of data points- as long as both sets of data have
> circa ten times the length of (cycles/distance) what you want to detect- If
> things in your case are spaced by one meter then the lowest cycle
> perdistance that you can reliably detect if 0.5 meters, this is all
> speculation because you don't have a problem with reproducible code, and we
> have no idea what you are measuring or what your data looks like- without
> this information there is no way that I can say one way or the other that
> you approach (suggested non-histogram) would be right or wrong.
>
> Stephen
>
>
> On Mon, Jun 16, 2008 at 9:33 AM, Anthony Mathelier <
> anthony.mathelier_at_gmail.com> wrote:
>
>> Perhaps I'm applying spec.pgram wrong as you said. I will explain what I
>> want, so you can tell me why I'm wrong and perhaps what I have to do to do
>> it well.
>> I have some points in a 1-D space and I want to know if they are spaced at
>> a certain periodic distance. So, I computed all the distances between points
>> in my space. Then, I would like to know if a certain distance (period), or
>> multiples of a certain distance, is preferred to space my data. I made a
>> histogram of the distances and apply the spec.pgram function to know the
>> frequence (so the period) which is the most important to space the original
>> data.
>> But, when I have to sets of data (without necessarily the same number of
>> observation in each set), I want to compare the importance of the period
>> given by spec.pgram between the sets. Could I normalize the amplitude of the
>> peaks given by spec.pgram?
>> So, am I wrong to apply this methodology to exhibit a periodic distance
>> between my data? If, true, what could you recommend me to do this?
>> Thanks in advance for your answers.
>> Best regards,
>>
>> Anthony
>>
>> On Tue, Jun 10, 2008 at 6:13 PM, stephen sefick <ssefick_at_gmail.com>
>> wrote:
>>
>>> I from a first thought I would say that you are apply this wrong! The
>>> fourier transform convolves a function (cos(x)+isin(x) (this may not be the
>>> exact formula but I don't have my books near)) to the data and then
>>> integrates over -1/2 to 1/2 takes the modulus and plots this- the
>>> periodogram. The reason you preform a fourier transform is to look at
>>> recurring frequencies in the data, which are in the time domain. The
>>> fourier transform converts the time series into the frequency domain and
>>> viola you have a peak into the hidden/recurring parts of your signal. From
>>> your explaination your are applying this technique wrong- look at schumway,
>>> MASS4, et al. books to get a handle on how this technique is used. If you
>>> are to apply a time series analysis please use it on a time series. Maybe
>>> your logic is not flawed but I don't see how a histogram with its associated
>>> binning is a better candidate for time series analysis than the original
>>> time series if at all.
>>> good luck
>>>
>>> Stephen
>>>
>>> On Tue, Jun 10, 2008 at 8:49 AM, Matthieu Stigler <
>>> Matthieu.Stigler_at_gmail.com> wrote:
>>>
>>>> Hello
>>>>
>>>> I don't know exactly what you want to do but:
>>>>
>>>> -why do you use in your example h$counts and not h? Furthermore helpl
>>>> file says it should be a time series, why then rather not your time series?
>>>>
>>>> -usually na.action will make the "default" action, which you can see by
>>>> getOptions("na.action")
>>>>
>>>> -here in this function it is provided in the function values na.action =
>>>> na.fail so it will just remove the NA in the time series
>>>>
>>>> -if you want to study a function, I advise you to copy it entirely,
>>>> rename it and then just insert print(curiousobject...) in the function, this
>>>> will allow you to let the function run and grasp the interessting objects,
>>>> like:
>>>>
>>>> study<-function (x, spans = NULL, kernel = NULL, taper = 0.1, pad = 0,
>>>> fast = TRUE, demean = FALSE, detrend = TRUE, plot = TRUE,
>>>> na.action = na.fail, ...)
>>>> {
>>>> series <- deparse(substitute(x))
>>>> x <- na.action(as.ts(x))
>>>> print(x)
>>>> xfreq <- frequency(x)
>>>> ...}
>>>> study(sunspots)
>>>>
>>>> -when you provide an example, instead of giving an external reference
>>>> for the data, try to search a convenient internal data (accessed by data()
>>>> ), so one will be able to reproduce your problems. Here you could use
>>>> sunspots
>>>>
>>>> -to obtain the commented code... I don't know it...
>>>>
>>>> -good luck
>>>>
>>>> Matthieu
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Hi everyone,
>>>>>
>>>>> first of all, I would like to say that I am a newbie in R, so I
>>>>> apologize in
>>>>> advance if my questions seem to be too easy for you.
>>>>>
>>>>> Well, I'm looking for periodicity in histograms. I have histograms of
>>>>> certain phenomenons and I'm asking whether a periodicity exists in
>>>>> these
>>>>> data. So, I make a periodogram with the function spec.pgram. For
>>>>> instance,
>>>>> if I have a histogram h, I call spec.pgram by spec.pgram (h, log="no",
>>>>> taper=0.5). So, I have some peaks that appear and I would like to
>>>>> interpret
>>>>> them but I do not know how they are computed and so what a peak with a
>>>>> value
>>>>> of 10000 represents in comparison with a peak of value 600 with another
>>>>> histogram.
>>>>> I looked at the source code of the function spec.pgram to better
>>>>> understand
>>>>> what is behind. But, when I apply the source code line by line, I've
>>>>> got a
>>>>> problem. For instance, I make:
>>>>>
>>>>>
>>>>>> >data = scan ("file.txt")
>>>>>> >h = hist (data, breaks=max(data)/5000)
>>>>>>
>>>>>>
>>>>> #then I apply the first two lines of the spec.pgram function
>>>>>
>>>>>
>>>>>> >series <- deparse(substitute(h$counts))
>>>>>> >x <- na.action(as.ts(h$counts))
>>>>>> >x
>>>>>>
>>>>>>
>>>>> NULL
>>>>> I do not understand why when I apply the first two lines of the
>>>>> function I
>>>>> have x which is equal to NULL (which make a mistake in the following
>>>>> lines
>>>>> of the code) but if I apply the function directly with h$counts it
>>>>> gives me
>>>>> a result.
>>>>> So, if someone can explain to me what is the problem and/or how
>>>>> spec.pgram
>>>>> exactly computes the periodogram and how to interpret it with my data,
>>>>> I
>>>>> would be so grateful.
>>>>> And subsidiary questions:
>>>>> - Is it possible to have the commented source code of the function?
>>>>> - I do not understand what is the function na.action in the second line
>>>>> of
>>>>> spec.pgram, so if you can explain it to me.
>>>>>
>>>>> Thanks in advance for your answers.
>>>>> Best regards,
>>>>>
>>>>> Anthony Mathelier
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help_at_r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> Let's not spend our time and resources thinking about things that are so
>>> little or so large that all they really do for us is puff us up and make us
>>> feel like gods. We are mammals, and have not exhausted the annoying little
>>> problems of being mammals.
>>>
>>> -K. Mullis
>>
>>
>>
>
>
> --
> Let's not spend our time and resources thinking about things that are so
> little or so large that all they really do for us is puff us up and make us
> feel like gods. We are mammals, and have not exhausted the annoying little
> problems of being mammals.
>
> -K. Mullis
>

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 16 Jun 2008 - 19:39:56 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 16 Jun 2008 - 20:32:15 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive