Re: Macro Loop and Lag

snowflake · Posted 04-18-2013 05:19 AM

Dear all

Here is my question:

I've the following dataset:

Id var1 var2 var3 var4 var5

1 45 66 88 888 99

1 35 36 98 254 345

1 546 5 51 564 .

2 654 87 54 654 651

2 546 87 861 651 651

2 215 55 654 896 514

2 654 51 678 651 215

=> for every id I have several observations, but the number of observations per id varies.

Now I would like to calculate all pairwise differences within every id .

I found out that I could do this using proc transpose to put the dataset in a "wide" format and than use a loop over the transposed variables.

Before I found that solution I wondered if it is possible to do that only with loops and lag-function?

That idea brought me to the following problems:

1. Is it possible to have a macro call from within a data step with a data set value as an argument and execute that code immediately, not after the data step is executed. I know that you can use "call execute" to do a macro call with a data set value as argument, but the problem is that the generated code is only executed after the data step is finished.

2. Is there another solution to the above described problem without transposing the data?

Thanks a lot for your help.

Keith · Posted 04-18-2013 05:22 AM

It would be helpful if you could post what your output dataset should look like, based on the example data you have provided.

UrvishShah · Posted 04-18-2013 05:38 AM

Can you please post the sample output you want...In your Question you want pairwise differences within each ID...

Can you please define pairwise differences ???

snowflake · Posted 04-18-2013 05:53 AM

I want to calculate all possible differences or to put it in other words compare all possible pairs of two values. As if for every ID I had a matrix (number_observations_for_id, 4) and I run through all rows and all columns and compare all possible pairs.

LinusH · Posted 04-18-2013 06:15 AM

This is not neat but should work:

Transpose the table so you only have just one value per row
Join the table with itself on id. Since there are multiple rows per id - a M-M join will occur.
Calculate the difference between the values from respective alias table

With this scenario you will also have a value minus itself calculated. To avoid that, ad a row no to the transposed table, and add a table_a.row_no <> table_b.row_no to the join criteria.

Data never sleeps

UrvishShah · Posted 04-18-2013 07:01 AM

Try following macro...I am not sure about your requirement...like you said you whould like to compare all possible pairs of two values...It means that you want to have comparision of all the observations...If it is like that than try following code...

%macro test;

%do i = 1 %to 2; /*I take stop value based on your unique ID values*/

data test;

set test;

_diff_var_&i. = dif(var&i.);

run;

%end;

%mend;

%test

This will compare just observations for all the variables...

Again it will be good if you can post your output which you want to have...

-Urvish

snowflake · Posted 04-18-2013 08:03 AM

Hi Urvish

Thanks for your reply. The problem with your solution ist that it only calculates the difference between the variable and with one lag of it.

What I would like is building all possible pairs within an id:

Id var1 var2 var3 var4 var5

1 45 66 88 888 99

1 35 36 98 254 345

1 546 5 51 564 22

2 654 87 54 654 651

2 546 87 861 651 651

2 215 55 654 896 514

2 654 51 678 651 215

Id dif1112 dif1113 dif1114 dif1115 dif1121 dif1122 ....

1 (45-66) (45-88) (45-888) (45-888) (45-35) (45-36) ...

So for id=1 I have 15*14 differences for all possible combinations.

I know I can do it with proc transpose and loop over the variables. I just wanted to know if there is another way. Especially combing a loop and lag.

Peter_C · Posted 04-18-2013 09:46 AM

The problem is more easily solved (at pair level) with a "self join" in PROC SQL

Peter_C · Posted 04-18-2013 07:15 PM

tested, here is the pair-wise join in SQL

it creates 500 rows

data starting ;

input

Id var1 var2 var3 var4 var5;

list;cards;

1 45 66 88 888 99

1 35 36 98 254 345

1 546 5 51 564 .

2 654 87 54 654 651

2 546 87 861 651 651

2 215 55 654 896 514

2 654 51 678 651 215

;

data starting2 ;

set starting ;

array vars (*) var: ;

do varn = 1 to dim(vars) ;

var = vars(varn) ;

output ;

end ;

keep id varn var ;

run ;

proc sql ;

create table start2 as

select a.id, a.var as left, b.var as right, a.varn as leftN, b.varN as rightN

from starting2 as a

join starting2 as b

on a.id eq b.id

and a.varn ne b.varn

;

quit ;

Astounding · Posted 04-18-2013 12:19 PM

Here's an approach (although I definitely did not dot all the i's or cross all the t's). It requires that you find out the maximum number of records for a single ID. For illustration purposes, I'll use 4 as the max. It also names the variables dif_001 - dif_190. You could always spell out the names in the proper order (which is not 100% clear and might be the hardest part of the whole task). Finally, it assumes that if you have taken the equivalent of dif1112, you do not need the same calculation in reverse order which would turn out to be dif1211. After the long worded introduction, load a single dimensional array. Then calculating all differences is fairly easy.

data want;

array values {20} _temporary_;

array diffs {190} dif_001 - dif_190;

array vars {5} var1-var5;

n_rows=0;

do until (last.id);

set have;

by id;

n_rows + 1;

do _n_=1 to 5;

values{5*(n_rows-1) + _n_} = vars{_n_};

end;

_n_=0;

do _i_=1 to 5*n_rows - 1;

do _k_ = _i_ + 1 to 5*n_rows;

_n_ + 1;

diffs{_n_} = values{_i_} - values{_k_};

end;

run;

All of this is untested code. The naming difficulties might be easier if you were to use a two-dimensional array. But the first step is to see if the approach makes sense to you.

Good luck.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away