Hi,
Suppose I have the following dataset
id year measure1 measure2 1 2000 0.41 - 1 2001 0.19 0.50 1 2002 0.51 0.75 1 2003 0.91 0.29 2 2000 0.69 - 2 2001 0.40 0.69 2 2002 0.69 0.29 2 2003 0.79 0.39
How would I do a correlation between consecutive years (grouped by id)? e.g. Correlation between measure1 in 2001 with measure1 in 2000, measure1 in 2002 with measure1 in 2001, etc.
Thank you,
@Wafflecakes - What is your definition of "correlation"?
If you have access to SAS/ETS, you could use proc autoreg, as in:
proc autoreg data=have;
class id;
model measure1 = id / nlag=1;
run;
Thank you PGStats. It seems to me that the autoreg procedure is for running a linear regression model for time series data as opposed to correlation? Can you elaborate?
Yes, autoreg performs regressions. But it relaxes the requirement that errors (residuals) be independent. Instead, the errors may be serially correlated. Part of the method involves estimating the correlation that exists between consecutive observations. This is the measure you are looking for.
Specifying the model as measure1 = id gets the procedure to remove the mean from each id series to get the residuals.
Unfortunately I get the above error message when I try to use proc autoreg in SAS version 9.4. Is there a workaround to this? Perhaps another pearson's correlation method?
I have tried proc corr as an alternative to this approach, but it did not give me the intended result.
Essentially, my data is structured as follows...I created x2000, x2001, x2002, and x2003 to equal x in the corresponding years.
id year x x2000 x2001 x2002 x2003
1 2000 0.59 0.59 NA NA NA
1 2001 0.69 NA 0.69 NA NA
1 2002 0.19 NA NA 0.19 NA
1 2003 0.39 NA NA NA 0.39
I used the following code to calculate correlations between consecutive years with this code:
proc corr data = dataset;
var x2000 x2001;
run;
However, when I run the code, I do not get the pearson's correlations between years x2000 and x2001.
Does anybody have any recommendations?
Your data isn't structured to be analyzed in that form. The NA's alone mean character unless you're doing that solely for the forum.
But I can only see what you post so I have to go off what you've shown.
If you look at PROC CORR and the examples, which have the full code you can see how your data should be structured for correlations.
@Wafflecakes wrote:
I have tried proc corr as an alternative to this approach, but it did not give me the intended result.
Essentially, my data is structured as follows...I created x2000, x2001, x2002, and x2003 to equal x in the corresponding years.
id year x x2000 x2001 x2002 x2003
1 2000 0.59 0.59 NA NA NA
1 2001 0.69 NA 0.69 NA NA
1 2002 0.19 NA NA 0.19 NA
1 2003 0.39 NA NA NA 0.39
I used the following code to calculate correlations between consecutive years with this code:
proc corr data = dataset;
var x2000 x2001;
run;
However, when I run the code, I do not get the pearson's correlations between years x2000 and x2001.
Does anybody have any recommendations?
Thank you, Reeza. Correct - the NA was done on purpose to represent missing data. May you suggest an example? I do not see any that would match my case and do not understand why proc corr would not work with the way I have structured my data.
Example of correlation analysis using TIMESERIES procedures
Here are instructions on how to provide sample data as a data step:
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
Thank you, Reeza, for your recommendation.
I coded it as follows:
proc timeseries data = test out = out outcorr = timedomain;
by id;
corr /nlag = 1
var measure1;
run;
I am not quite sure what the purpose of outcorr is and whether I have specified the nlag correctly (I am trying to do a correlation between the value of measure1 in each year with the corresponding previous year).
In the timedomain dataset that is outputted, which specific variable actually tells me what the correlation is?
Hello,
Following up on this. I wonder if anyone has any thoughts about how to calculate these correlations?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.