Hi,
I have two datasets
| Original | ||
| Date | x | y |
| 15-08-2021 | 1 | 7 |
| 16-08-2021 | 2 | 4 |
| 17-08-2021 | 3 | 3 |
| 18-08-2021 | 4 | 3 |
| Final | ||
| Date | x | y |
| 15-08-2021 | 8 | 6 |
| 16-08-2021 | 7 | 3 |
| 17-08-2021 | 6 | 2 |
| 18-08-2021 | 4 | 1 |
I would like to find the correlation between x of original and x of final. Similarly for each corresponding pair.
Please advise me how to do this.
Jijil Ramakrishnan
You can use a double transpose to put a prefix in front of the variables if your variable names are not so long that additional characters won't make the names too long.
Example adding Final_ as prefix:
data junk;
input a $ x y;
datalines;
a 1 2
b 11 22
c 111 222
;
Proc transpose data=junk out=trans
;
by a;
run;
proc transpose data=trans out=finalwant (drop=_name_)
Prefix=Final_;
by a;
id _name_;
var col1;
run;
This will only rename numeric variables which for Proc Corr is sufficient because you cannot calculate correlation for character valued variables.
You could do the same thing, obviously using a different output name for the second Proc Transpose with a different prefix for the "original" data.
The merge would not need the rename step now.
In the Proc Corr you could use a prefix variable list in the Var and With statements. Assuming you used "Old_" as the prefix for the original data you would use
Proc corr data=merged; var Old_: ; with Final_: ; run;
The construct Old_: with the colon immediately after a prefix tells SAS to use all the variables that start with that prefix almost any place a list of variables can appear.
@JAR wrote:
Hi,
I have two datasets
Original Date x y 15-08-2021 1 7 16-08-2021 2 4 17-08-2021 3 3 18-08-2021 4 3 Final Date x y 15-08-2021 8 6 16-08-2021 7 3 17-08-2021 6 2 18-08-2021 4 1
I would like to find the correlation between x of original and x of final. Similarly for each corresponding pair.
Please advise me how to do this.
Jijil Ramakrishnan
I think you are looking for something like:
/* assumes 1) no duplicates of date
2) both the data sets are sorted by date
*/
data need;
merge original
final (rename=(x=xfinal y=yfinal))
;
by date;
run;
proc corr data=need;
var x y;
with xfinal yfinal;
run;
However if you have multiples for the date variable in both data sets you need to provide a more complete example of both data sets.
You can use a double transpose to put a prefix in front of the variables if your variable names are not so long that additional characters won't make the names too long.
Example adding Final_ as prefix:
data junk;
input a $ x y;
datalines;
a 1 2
b 11 22
c 111 222
;
Proc transpose data=junk out=trans
;
by a;
run;
proc transpose data=trans out=finalwant (drop=_name_)
Prefix=Final_;
by a;
id _name_;
var col1;
run;
This will only rename numeric variables which for Proc Corr is sufficient because you cannot calculate correlation for character valued variables.
You could do the same thing, obviously using a different output name for the second Proc Transpose with a different prefix for the "original" data.
The merge would not need the rename step now.
In the Proc Corr you could use a prefix variable list in the Var and With statements. Assuming you used "Old_" as the prefix for the original data you would use
Proc corr data=merged; var Old_: ; with Final_: ; run;
The construct Old_: with the colon immediately after a prefix tells SAS to use all the variables that start with that prefix almost any place a list of variables can appear.
Do you really want to "correlate" them? Or just check for differences?
If the later look at PROC COMPARE
proc compare data=original compare=final;
id date;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.