Re: Correlations between consecutive observations (grouped by individu...

Wafflecakes · Posted 02-02-2020 06:03 PM

Hi,

Suppose I have the following dataset

id year measure1 measure2 

1 2000 0.41 -

1 2001 0.19 0.50

1 2002 0.51 0.75

1 2003 0.91 0.29

2 2000 0.69 - 

2 2001 0.40 0.69

2 2002 0.69 0.29

2 2003 0.79 0.39

How would I do a correlation between consecutive years (grouped by id)? e.g. Correlation between measure1 in 2001 with measure1 in 2000, measure1 in 2002 with measure1 in 2001, etc.

Thank you,

SASKiwi · Posted 02-02-2020 06:26 PM

@Wafflecakes - What is your definition of "correlation"?

Wafflecakes · Posted 02-02-2020 06:27 PM

Thank you for your question SASKiwi. Pearson's correlation as both measure1 and measure2 are continuous.

Reeza · Posted 02-02-2020 11:31 PM

You can look at autocorrelation which is typically used for time series analysis. Note the formula is slightly different than regular correlation if you're trying to verify answers or if you don't have SAS/ETS.

PGStats · Posted 02-02-2020 10:42 PM

If you have access to SAS/ETS, you could use proc autoreg, as in:


proc autoreg data=have;
class id;
model measure1 = id / nlag=1;
run;

PG

Wafflecakes · Posted 02-03-2020 12:33 PM

Thank you PGStats. It seems to me that the autoreg procedure is for running a linear regression model for time series data as opposed to correlation? Can you elaborate?

PGStats · Posted 02-03-2020 11:11 PM

Yes, autoreg performs regressions. But it relaxes the requirement that errors (residuals) be independent. Instead, the errors may be serially correlated. Part of the method involves estimating the correlation that exists between consecutive observations. This is the measure you are looking for.

Specifying the model as measure1 = id gets the procedure to remove the mean from each id series to get the residuals.

PG

Wafflecakes · Posted 02-12-2020 07:46 AM

Unfortunately I get the above error message when I try to use proc autoreg in SAS version 9.4. Is there a workaround to this? Perhaps another pearson's correlation method?

Wafflecakes · Posted 02-18-2020 09:15 PM

I have tried proc corr as an alternative to this approach, but it did not give me the intended result.

Essentially, my data is structured as follows...I created x2000, x2001, x2002, and x2003 to equal x in the corresponding years.

id year x x2000 x2001 x2002 x2003

1 2000 0.59 0.59 NA NA NA

1 2001 0.69 NA 0.69 NA NA

1 2002 0.19 NA NA 0.19 NA

1 2003 0.39 NA NA NA 0.39

I used the following code to calculate correlations between consecutive years with this code:

proc corr data = dataset;

var x2000 x2001;

run;

However, when I run the code, I do not get the pearson's correlations between years x2000 and x2001.

Does anybody have any recommendations?

Reeza · Posted 02-18-2020 09:28 PM

Your data isn't structured to be analyzed in that form. The NA's alone mean character unless you're doing that solely for the forum.

But I can only see what you post so I have to go off what you've shown.

If you look at PROC CORR and the examples, which have the full code you can see how your data should be structured for correlations.

@Wafflecakes wrote:

I have tried proc corr as an alternative to this approach, but it did not give me the intended result.

Essentially, my data is structured as follows...I created x2000, x2001, x2002, and x2003 to equal x in the corresponding years.

id year x x2000 x2001 x2002 x2003

1 2000 0.59 0.59 NA NA NA

1 2001 0.69 NA 0.69 NA NA

1 2002 0.19 NA NA 0.19 NA

1 2003 0.39 NA NA NA 0.39

I used the following code to calculate correlations between consecutive years with this code:

proc corr data = dataset;

var x2000 x2001;

run;

However, when I run the code, I do not get the pearson's correlations between years x2000 and x2001.

Does anybody have any recommendations?

Wafflecakes · Posted 02-19-2020 10:02 PM

Thank you, Reeza. Correct - the NA was done on purpose to represent missing data. May you suggest an example? I do not see any that would match my case and do not understand why proc corr would not work with the way I have structured my data.

Reeza · Posted 02-19-2020 10:41 PM

It will not work because SAS removes rows that have missing values from the calculation which leaves you with a single observation to calculate the correlation which isn't possible mathematically.

Reeza · Posted 02-19-2020 10:47 PM

Example of correlation analysis using TIMESERIES procedures

https://documentation.sas.com/?docsetId=etsug&docsetVersion=15.1&docsetTarget=etsug_timeseries_examp...

Here are instructions on how to provide sample data as a data step:
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

Wafflecakes · Posted 02-23-2020 08:48 PM

Thank you, Reeza, for your recommendation.

I coded it as follows:

proc timeseries data = test out = out outcorr = timedomain;

by id;

corr /nlag = 1

var measure1;

run;

I am not quite sure what the purpose of outcorr is and whether I have specified the nlag correctly (I am trying to do a correlation between the value of measure1 in each year with the corresponding previous year).

In the timedomain dataset that is outputted, which specific variable actually tells me what the correlation is?

Wafflecakes · Posted 03-09-2020 09:08 PM

Hello,

Following up on this. I wonder if anyone has any thoughts about how to calculate these correlations?

Correlations between consecutive observations (grouped by individual)

Re: Correlations between consecutive observations (grouped by individual)

Re: Correlations between consecutive observations (grouped by individual)

Re: Correlations between consecutive observations (grouped by individual)

Re: Correlations between consecutive observations (grouped by individual)

Re: Correlations between consecutive observations (grouped by individual)

Re: Correlations between consecutive observations (grouped by individual)

Re: Correlations between consecutive observations (grouped by individual)

Re: Correlations between consecutive observations (grouped by individual)

Re: Correlations between consecutive observations (grouped by individual)

Re: Correlations between consecutive observations (grouped by individual)

Re: Correlations between consecutive observations (grouped by individual)

Re: Correlations between consecutive observations (grouped by individual)

Re: Correlations between consecutive observations (grouped by individual)

Re: Correlations between consecutive observations (grouped by individual)

SAS Innovate 2025: Save the Date