DATA Step, Macro, Functions and more

Macro Loop and Lag

Reply
Occasional Contributor
Posts: 8

Macro Loop and Lag


Dear all

Here is my question:

I've the following dataset:

Id   var1 var2 var3 var4 var5

1    45   66   88   888  99

1    35   36   98   254 345

1   546  5     51   564  .

2   654  87 54    654  651

2   546  87  861 651  651

2   215  55  654  896 514

2  654  51  678  651  215

=> for every id I have several observations, but the number of observations per id varies.

Now I would like to calculate all pairwise differences within every id .

I found out that I could do this using proc transpose to put the dataset in a "wide" format and than use a loop over the transposed variables.

Before I found that solution I wondered if it is possible to do that only with loops and lag-function?

That idea brought me to the following problems:

1. Is it possible to have a macro call from within a data step with a data set value as an argument and execute that code immediately, not after the data step is executed. I know that you can use "call execute" to do a macro call with a data set value as argument, but the problem is that the generated code is only executed after the data step is finished.

2. Is there another solution to the above described problem without transposing the data?

Thanks a lot for your help.

Regular Contributor
Posts: 151

Re: Macro Loop and Lag

It would be helpful if you could post what your output dataset should look like, based on the example data you have provided.

Regular Contributor
Posts: 195

Re: Macro Loop and Lag

Can you please post the sample output you want...In your Question you want pairwise differences within each ID...

Can you please define pairwise differences ???


Occasional Contributor
Posts: 8

Re: Macro Loop and Lag

I want to calculate all possible differences or to put it in other words compare all possible pairs of two values. As if for every ID I had a matrix (number_observations_for_id, 4) and I run through all rows and all columns and compare all possible pairs.  

Super User
Posts: 5,257

Re: Macro Loop and Lag

This is not neat but should work:

  1. Transpose the table so you only have just one value per row
  2. Join the table with itself on id. Since there are multiple rows per id - a M-M join will occur.
  3. Calculate the difference between the values from respective alias table

With this scenario you will also have a value minus itself calculated. To avoid that, ad a row no to the transposed table, and add a table_a.row_no <> table_b.row_no to the join criteria.

Data never sleeps
Regular Contributor
Posts: 195

Re: Macro Loop and Lag

Try following macro...I am not sure about your requirement...like you said you whould like to compare all possible pairs of two values...It means that you want to have comparision of all the observations...If it is like that than try following code...

%macro test;

    

     %do i = 1 %to 2; /*I take stop value based on your unique ID values*/

         data test;

           set test;

           _diff_var_&i. = dif(var&i.);

         run;

     %end;

%mend;

%test

This will compare just observations for all the variables...

Again it will be good if you can post your output which you want to have...

-Urvish

Occasional Contributor
Posts: 8

Re: Macro Loop and Lag

Hi Urvish

Thanks for your reply. The problem with your solution ist that it only calculates the difference between the variable and with one lag of it.

What I would like is building all possible pairs within an id:

Id   var1 var2 var3 var4 var5

1    45   66   88   888  99

1    35   36   98   254 345

1   546   5     51   564  22

2   654  87   54    654  651

2   546  87  861   651  651

2   215  55  654   896 514

2  654  51  678   651  215

Id dif1112  dif1113  dif1114       dif1115   dif1121  dif1122 .... 

1 (45-66)   (45-88)   (45-888)    (45-888)  (45-35)   (45-36)  ...      

So for id=1 I have 15*14 differences for all possible combinations.

I know I can do it with proc transpose and loop over the variables. I just wanted to know if there is another way. Especially combing a loop and lag.

Valued Guide
Posts: 2,175

Re: Macro Loop and Lag

The problem is more easily solved (at pair level) with a "self join" in PROC SQL

Valued Guide
Posts: 2,175

Re: Macro Loop and Lag

tested, here is the pair-wise join in SQL

it creates 500 rows

data starting ;

input

Id   var1 var2 var3 var4 var5;

list;cards;

1    45   66   88   888  99

1    35   36   98   254 345

1   546  5     51   564  .

2   654  87 54    654  651

2   546  87  861 651  651

2   215  55  654  896 514

2  654  51  678  651  215

;

data starting2 ;

set starting ;

  array vars (*) var: ;

  do varn = 1 to dim(vars) ;

     var = vars(varn) ;

     output ;

  end ;

keep id varn var ;

run ;

proc sql ;

create table start2 as

select a.id, a.var as left, b.var as right, a.varn as leftN, b.varN as rightN

from starting2 as a

join starting2 as b

  on a.id   eq b.id

and a.varn ne b.varn

;

quit ;

Super User
Posts: 5,085

Re: Macro Loop and Lag

Here's an approach (although I definitely did not dot all the i's or cross all the t's).  It requires that you find out the maximum number of records for a single ID.  For illustration purposes, I'll use 4 as the max.  It also names the variables dif_001 - dif_190.  You could always spell out the names in the proper order (which is not 100% clear and might be the hardest part of the whole task).  Finally, it assumes that if you have taken the equivalent of dif1112, you do not need the same calculation in reverse order which would turn out to be dif1211.  After the long worded introduction, load a single dimensional array.  Then calculating all differences is fairly easy.

data want;

   array values {20} _temporary_;

   array diffs {190} dif_001 - dif_190;

   array vars {5} var1-var5;

   n_rows=0;

   do until (last.id);

      set have;

      by id;

      n_rows + 1;

      do _n_=1 to 5;

         values{5*(n_rows-1) + _n_} = vars{_n_};

      end;

   end;

   _n_=0;

   do _i_=1 to 5*n_rows - 1;

      do _k_ = _i_ + 1 to 5*n_rows;

         _n_ + 1;

         diffs{_n_} = values{_i_} - values{_k_};

      end;

   end;

run;

All of this is untested code.  The naming difficulties might be easier if you were to use a two-dimensional array.  But the first step is to see if the approach makes sense to you.

Good luck.

Ask a Question
Discussion stats
  • 9 replies
  • 613 views
  • 0 likes
  • 6 in conversation