Dear all
Here is my question:
I've the following dataset:
Id var1 var2 var3 var4 var5
1 45 66 88 888 99
1 35 36 98 254 345
1 546 5 51 564 .
2 654 87 54 654 651
2 546 87 861 651 651
2 215 55 654 896 514
2 654 51 678 651 215
=> for every id I have several observations, but the number of observations per id varies.
Now I would like to calculate all pairwise differences within every id .
I found out that I could do this using proc transpose to put the dataset in a "wide" format and than use a loop over the transposed variables.
Before I found that solution I wondered if it is possible to do that only with loops and lag-function?
That idea brought me to the following problems:
1. Is it possible to have a macro call from within a data step with a data set value as an argument and execute that code immediately, not after the data step is executed. I know that you can use "call execute" to do a macro call with a data set value as argument, but the problem is that the generated code is only executed after the data step is finished.
2. Is there another solution to the above described problem without transposing the data?
Thanks a lot for your help.
It would be helpful if you could post what your output dataset should look like, based on the example data you have provided.
Can you please post the sample output you want...In your Question you want pairwise differences within each ID...
Can you please define pairwise differences ???
I want to calculate all possible differences or to put it in other words compare all possible pairs of two values. As if for every ID I had a matrix (number_observations_for_id, 4) and I run through all rows and all columns and compare all possible pairs.
This is not neat but should work:
With this scenario you will also have a value minus itself calculated. To avoid that, ad a row no to the transposed table, and add a table_a.row_no <> table_b.row_no to the join criteria.
Try following macro...I am not sure about your requirement...like you said you whould like to compare all possible pairs of two values...It means that you want to have comparision of all the observations...If it is like that than try following code...
%macro test;
%do i = 1 %to 2; /*I take stop value based on your unique ID values*/
data test;
set test;
_diff_var_&i. = dif(var&i.);
run;
%end;
%mend;
%test
This will compare just observations for all the variables...
Again it will be good if you can post your output which you want to have...
-Urvish
Hi Urvish
Thanks for your reply. The problem with your solution ist that it only calculates the difference between the variable and with one lag of it.
What I would like is building all possible pairs within an id:
Id var1 var2 var3 var4 var5
1 45 66 88 888 99
1 35 36 98 254 345
1 546 5 51 564 22
2 654 87 54 654 651
2 546 87 861 651 651
2 215 55 654 896 514
2 654 51 678 651 215
Id dif1112 dif1113 dif1114 dif1115 dif1121 dif1122 ....
1 (45-66) (45-88) (45-888) (45-888) (45-35) (45-36) ...
So for id=1 I have 15*14 differences for all possible combinations.
I know I can do it with proc transpose and loop over the variables. I just wanted to know if there is another way. Especially combing a loop and lag.
The problem is more easily solved (at pair level) with a "self join" in PROC SQL
tested, here is the pair-wise join in SQL
it creates 500 rows
data starting ;
input
Id var1 var2 var3 var4 var5;
list;cards;
1 45 66 88 888 99
1 35 36 98 254 345
1 546 5 51 564 .
2 654 87 54 654 651
2 546 87 861 651 651
2 215 55 654 896 514
2 654 51 678 651 215
;
data starting2 ;
set starting ;
array vars (*) var: ;
do varn = 1 to dim(vars) ;
var = vars(varn) ;
output ;
end ;
keep id varn var ;
run ;
proc sql ;
create table start2 as
select a.id, a.var as left, b.var as right, a.varn as leftN, b.varN as rightN
from starting2 as a
join starting2 as b
on a.id eq b.id
and a.varn ne b.varn
;
quit ;
Here's an approach (although I definitely did not dot all the i's or cross all the t's). It requires that you find out the maximum number of records for a single ID. For illustration purposes, I'll use 4 as the max. It also names the variables dif_001 - dif_190. You could always spell out the names in the proper order (which is not 100% clear and might be the hardest part of the whole task). Finally, it assumes that if you have taken the equivalent of dif1112, you do not need the same calculation in reverse order which would turn out to be dif1211. After the long worded introduction, load a single dimensional array. Then calculating all differences is fairly easy.
data want;
array values {20} _temporary_;
array diffs {190} dif_001 - dif_190;
array vars {5} var1-var5;
n_rows=0;
do until (last.id);
set have;
by id;
n_rows + 1;
do _n_=1 to 5;
values{5*(n_rows-1) + _n_} = vars{_n_};
end;
end;
_n_=0;
do _i_=1 to 5*n_rows - 1;
do _k_ = _i_ + 1 to 5*n_rows;
_n_ + 1;
diffs{_n_} = values{_i_} - values{_k_};
end;
end;
run;
All of this is untested code. The naming difficulties might be easier if you were to use a two-dimensional array. But the first step is to see if the approach makes sense to you.
Good luck.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.