Re: [R] using spec.pgram

From: Anthony Mathelier <anthony.mathelier_at_gmail.com>
Date: Tue, 17 Jun 2008 09:58:59 +0200

Even if the signal given by the histogram is not really a signal, it seems that spec.pgram can give an interesting evaluation of how the genes are spaced in the chromosome, like in the article. So now, when I study a chromosome with 200 interesting genes, I would like to compare the amplitude of the spectrum of the periodogram given by spec.pgram (applied on the histogram of the distances between genes) with another periodogram for a chromosome with 350 interesting genes. As the spectrum seems to be calculated approximately like fft(x)^2/N (according to the computation of pgram[] in spec.pgram but perhaps I am wrong) where x stands for the signal and N for the number of observations in the signal, I suppose I can compare to periodogram (and the values of the peaks) by dividing the value of the spectrum by N. I apply this kind of technique with a cosine signal like this:
> x= 1:1000
> cox = cos (x)
> spec.pgram (cox, log="no", taper=0.5)
> x= 1:100
> cox = cos (x)
> spec.pgram (cox, log="no", taper=0.5)
and the two periodograms have an amplitude of the peak equals when we divide it by the corresponding Ns.
Am I wrong if I use this technique to compare my periodograms?

Thanks in advance.
Best regards,

Anthony

On Mon, Jun 16, 2008 at 8:37 PM, stephen sefick <ssefick_at_gmail.com> wrote:

> OK so this is what I think- The gaussian smoothing window is just like the
> smooth curve that i suggested that you draw on top of the histogram (try
> looking at ?density and on the R site search page). They take this and then
> make the density = 1. I am not sure how this is done, but i am sure that
> you could figure it out. I believe that what they are doing is still taking
> the area under the curve at discrete "periodicities" (I would not call these
> periodicities because there is not really a periodic part of this signal-
> there is re-occurance, but not really periodicity) and that is what the
> "power spectrum" is revealing. This is not a typical use of the fourier
> transform, but may be valid. this signal is not stationary So I would
> suggest using wavelet analysis, but it still does not seem be a classical
> signal analysis problem- I would look at Price et al. in the reference
> section to see if there is presidence for this type of analysis. But from
> my experience in time domain to frequency domain problems this does not fit
> the model of data that I have worked with, and therefore it may be a bias on
> my part, but I would use the histogram as my justification for the distances
> being significant.
> Good Luck
>
> Stephen
>
>
> On Mon, Jun 16, 2008 at 2:04 PM, stephen sefick <ssefick@gmail.com> wrote:
>
>> i am reading the paper and trying to figure out what they are doing. At
>> this point in time it looks like what they are doing is using the value at
>> the top of the histogram bar as the value at the distance on the x-axis.
>> they then use the equivalent of spec.pgram. My nearest approximation of
>> what this does is that this analysis is integrating the area under the curve
>> at a particular time. It doesn't seem that there is any periodicity in this
>> data because of the fact that there isn't a real signal here- it is binned
>> by distance between the genes. Not to say that spectral density is not
>> valid, but is is not a periodicity that this analysis is look at rather an
>> amount of power (area under the curve). I am not entirely sure that this is
>> any more information than what is contained in the historgrams. Take a pen
>> and draw from the top of each box starting on the left- This is the
>> "signal" that is being analyzed. I need to read the rest of the paper and
>> think about it a little bit more. If you have any ideas- pass them along.
>>
>> Stephen
>>
>>
>> On Mon, Jun 16, 2008 at 12:14 PM, Anthony Mathelier <
>> anthony.mathelier_at_gmail.com> wrote:
>>
>>> OK, it seems like I do not succeed in expressing what I do, or want to
>>> do. So, I give you the example that bring me to this kind of analysis. I
>>> wrote the paper "Chromosomal periodicity of evolutionary conserved gene
>>> pairs" (which you can download at
>>> http://www.pnas.org/cgi/reprint/104/25/10559). In figure 2, they have a
>>> histogram of distances between genes on a chromosome and they make a
>>> discrete fourier transform analysis to exhibit a period of 117kb. They
>>> explain how they did in the first paragraph of "Distributions of distances
>>> and positions and fourier transform" (last page). I thought that this kind
>>> of analysis was made by spec.pgram with a histogram. But perhaps I am wrong
>>> because I really do not understand what they mean by "the histogram was
>>> tranformed into a continuous probability density by using a Gaussian
>>> smoothing window and normalizing the total density over the entire genome to
>>> 1. A discrete Fourier transform of the data were computed from 0 to 1,000kb
>>> by using a Tukey window to taper the end (ratio of 0.5 for tapered to
>>> untapered length.".
>>> I hope it explains better what I want to obtain from my distances.
>>> Best regards,
>>>
>>> Anthony
>>>
>>>
>>> On Mon, Jun 16, 2008 at 5:25 PM, stephen sefick <ssefick_at_gmail.com>
>>> wrote:
>>>
>>>> To get some sort of frequency which in your case seem to be cycles per
>>>> distance? Is a valid use of a fourier transform as long as it is a distance
>>>> that is measured in a way that would be analogous to a time series- In
>>>> other words if the distance proceeds from an origin in one direction-
>>>> geophysicists do this often with the realization of an earthquake picked up
>>>> by sensors that are a distance away from the origin of the epicenter, but
>>>> they are looking for coherencies in the signal from one place to the next in
>>>> the frequency domain seperated by distance- this is called beam forming-
>>>> They use the raw signal- by binning (making a histogram) the data you are
>>>> loosing the signal- you are looking at frequency of occurance of certain
>>>> values not for the underlying periodicities of the data (in time or
>>>> space). You are fitting cos and isin function to you data to see if there
>>>> is periodicity- the power is the integration of the convolution of this sin
>>>> and cosine function with your data- It seems to me meaningless to preform
>>>> this convolution agianst something that is not a signal (the histogram). If
>>>> you want to use a frequency domain technique you have to have a frequency to
>>>> investigate- a histogram does not have this- I is a frequency of occurance
>>>> by bin size which is NOT what you want (your would have cycles/binlength
>>>> that doesn't make any sense to me) to do this analysis on- You want a
>>>> signal- dissolved oxygen curve, sunspot record, etc. through time, or
>>>> distance as stated above- you are looking for the frequency of a waveform-
>>>> Anyway, I may be misunderstanding- supply some code and explain the data
>>>> otherwise this line of though- in my limited expertise- is a dead end, but
>>>> agian I still don't know what it is that you are, exactly, trying to do- and
>>>> what your dataset constits. I hope these ruminations help
>>>>
>>>> I recommend doing this analysis on the raw data- It doesn't matter that
>>>> you don't have the same amount of data points- as long as both sets of data
>>>> have circa ten times the length of (cycles/distance) what you want to
>>>> detect- If things in your case are spaced by one meter then the lowest
>>>> cycle perdistance that you can reliably detect if 0.5 meters, this is all
>>>> speculation because you don't have a problem with reproducible code, and we
>>>> have no idea what you are measuring or what your data looks like- without
>>>> this information there is no way that I can say one way or the other that
>>>> you approach (suggested non-histogram) would be right or wrong.
>>>>
>>>> Stephen
>>>>
>>>>
>>>> On Mon, Jun 16, 2008 at 9:33 AM, Anthony Mathelier <
>>>> anthony.mathelier_at_gmail.com> wrote:
>>>>
>>>>> Perhaps I'm applying spec.pgram wrong as you said. I will explain what
>>>>> I want, so you can tell me why I'm wrong and perhaps what I have to do to do
>>>>> it well.
>>>>> I have some points in a 1-D space and I want to know if they are spaced
>>>>> at a certain periodic distance. So, I computed all the distances between
>>>>> points in my space. Then, I would like to know if a certain distance
>>>>> (period), or multiples of a certain distance, is preferred to space my data.
>>>>> I made a histogram of the distances and apply the spec.pgram function to
>>>>> know the frequence (so the period) which is the most important to space the
>>>>> original data.
>>>>> But, when I have to sets of data (without necessarily the same number
>>>>> of observation in each set), I want to compare the importance of the period
>>>>> given by spec.pgram between the sets. Could I normalize the amplitude of the
>>>>> peaks given by spec.pgram?
>>>>> So, am I wrong to apply this methodology to exhibit a periodic distance
>>>>> between my data? If, true, what could you recommend me to do this?
>>>>> Thanks in advance for your answers.
>>>>> Best regards,
>>>>>
>>>>> Anthony
>>>>>
>>>>> On Tue, Jun 10, 2008 at 6:13 PM, stephen sefick <ssefick_at_gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I from a first thought I would say that you are apply this wrong! The
>>>>>> fourier transform convolves a function (cos(x)+isin(x) (this may not be the
>>>>>> exact formula but I don't have my books near)) to the data and then
>>>>>> integrates over -1/2 to 1/2 takes the modulus and plots this- the
>>>>>> periodogram. The reason you preform a fourier transform is to look at
>>>>>> recurring frequencies in the data, which are in the time domain. The
>>>>>> fourier transform converts the time series into the frequency domain and
>>>>>> viola you have a peak into the hidden/recurring parts of your signal. From
>>>>>> your explaination your are applying this technique wrong- look at schumway,
>>>>>> MASS4, et al. books to get a handle on how this technique is used. If you
>>>>>> are to apply a time series analysis please use it on a time series. Maybe
>>>>>> your logic is not flawed but I don't see how a histogram with its associated
>>>>>> binning is a better candidate for time series analysis than the original
>>>>>> time series if at all.
>>>>>> good luck
>>>>>>
>>>>>> Stephen
>>>>>>
>>>>>> On Tue, Jun 10, 2008 at 8:49 AM, Matthieu Stigler <
>>>>>> Matthieu.Stigler_at_gmail.com> wrote:
>>>>>>
>>>>>>> Hello
>>>>>>>
>>>>>>> I don't know exactly what you want to do but:
>>>>>>>
>>>>>>> -why do you use in your example h$counts and not h? Furthermore helpl
>>>>>>> file says it should be a time series, why then rather not your time series?
>>>>>>>
>>>>>>> -usually na.action will make the "default" action, which you can see
>>>>>>> by getOptions("na.action")
>>>>>>>
>>>>>>> -here in this function it is provided in the function values
>>>>>>> na.action = na.fail so it will just remove the NA in the time series
>>>>>>>
>>>>>>> -if you want to study a function, I advise you to copy it entirely,
>>>>>>> rename it and then just insert print(curiousobject...) in the function, this
>>>>>>> will allow you to let the function run and grasp the interessting objects,
>>>>>>> like:
>>>>>>>
>>>>>>> study<-function (x, spans = NULL, kernel = NULL, taper = 0.1, pad =
>>>>>>> 0, fast = TRUE, demean = FALSE, detrend = TRUE, plot = TRUE,
>>>>>>> na.action = na.fail, ...)
>>>>>>> {
>>>>>>> series <- deparse(substitute(x))
>>>>>>> x <- na.action(as.ts(x))
>>>>>>> print(x)
>>>>>>> xfreq <- frequency(x)
>>>>>>> ...}
>>>>>>> study(sunspots)
>>>>>>>
>>>>>>> -when you provide an example, instead of giving an external reference
>>>>>>> for the data, try to search a convenient internal data (accessed by data()
>>>>>>> ), so one will be able to reproduce your problems. Here you could use
>>>>>>> sunspots
>>>>>>>
>>>>>>> -to obtain the commented code... I don't know it...
>>>>>>>
>>>>>>> -good luck
>>>>>>>
>>>>>>> Matthieu
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi everyone,
>>>>>>>>
>>>>>>>> first of all, I would like to say that I am a newbie in R, so I
>>>>>>>> apologize in
>>>>>>>> advance if my questions seem to be too easy for you.
>>>>>>>>
>>>>>>>> Well, I'm looking for periodicity in histograms. I have histograms
>>>>>>>> of
>>>>>>>> certain phenomenons and I'm asking whether a periodicity exists in
>>>>>>>> these
>>>>>>>> data. So, I make a periodogram with the function spec.pgram. For
>>>>>>>> instance,
>>>>>>>> if I have a histogram h, I call spec.pgram by spec.pgram (h,
>>>>>>>> log="no",
>>>>>>>> taper=0.5). So, I have some peaks that appear and I would like to
>>>>>>>> interpret
>>>>>>>> them but I do not know how they are computed and so what a peak with
>>>>>>>> a value
>>>>>>>> of 10000 represents in comparison with a peak of value 600 with
>>>>>>>> another
>>>>>>>> histogram.
>>>>>>>> I looked at the source code of the function spec.pgram to better
>>>>>>>> understand
>>>>>>>> what is behind. But, when I apply the source code line by line, I've
>>>>>>>> got a
>>>>>>>> problem. For instance, I make:
>>>>>>>>
>>>>>>>>
>>>>>>>>> >data = scan ("file.txt")
>>>>>>>>> >h = hist (data, breaks=max(data)/5000)
>>>>>>>>>
>>>>>>>>>
>>>>>>>> #then I apply the first two lines of the spec.pgram function
>>>>>>>>
>>>>>>>>
>>>>>>>>> >series <- deparse(substitute(h$counts))
>>>>>>>>> >x <- na.action(as.ts(h$counts))
>>>>>>>>> >x
>>>>>>>>>
>>>>>>>>>
>>>>>>>> NULL
>>>>>>>> I do not understand why when I apply the first two lines of the
>>>>>>>> function I
>>>>>>>> have x which is equal to NULL (which make a mistake in the following
>>>>>>>> lines
>>>>>>>> of the code) but if I apply the function directly with h$counts it
>>>>>>>> gives me
>>>>>>>> a result.
>>>>>>>> So, if someone can explain to me what is the problem and/or how
>>>>>>>> spec.pgram
>>>>>>>> exactly computes the periodogram and how to interpret it with my
>>>>>>>> data, I
>>>>>>>> would be so grateful.
>>>>>>>> And subsidiary questions:
>>>>>>>> - Is it possible to have the commented source code of the function?
>>>>>>>> - I do not understand what is the function na.action in the second
>>>>>>>> line of
>>>>>>>> spec.pgram, so if you can explain it to me.
>>>>>>>>
>>>>>>>> Thanks in advance for your answers.
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>> Anthony Mathelier
>>>>>>>>
>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help_at_r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Let's not spend our time and resources thinking about things that are
>>>>>> so little or so large that all they really do for us is puff us up and make
>>>>>> us feel like gods. We are mammals, and have not exhausted the annoying
>>>>>> little problems of being mammals.
>>>>>>
>>>>>> -K. Mullis
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Let's not spend our time and resources thinking about things that are so
>>>> little or so large that all they really do for us is puff us up and make us
>>>> feel like gods. We are mammals, and have not exhausted the annoying little
>>>> problems of being mammals.
>>>>
>>>> -K. Mullis
>>>>
>>>
>>>
>>
>>
>> --
>> Let's not spend our time and resources thinking about things that are so
>> little or so large that all they really do for us is puff us up and make us
>> feel like gods. We are mammals, and have not exhausted the annoying little
>> problems of being mammals.
>>
>> -K. Mullis
>>
>
>
>
> --
> Let's not spend our time and resources thinking about things that are so
> little or so large that all they really do for us is puff us up and make us
> feel like gods. We are mammals, and have not exhausted the annoying little
> problems of being mammals.
>
> -K. Mullis
>

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 17 Jun 2008 - 16:04:21 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 17 Jun 2008 - 16:30:41 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive