BookmarkSubscribeRSS Feed
snowflake
Fluorite | Level 6


Dear all

Here is my question:

I've the following dataset:

Id   var1 var2 var3 var4 var5

1    45   66   88   888  99

1    35   36   98   254 345

1   546  5     51   564  .

2   654  87 54    654  651

2   546  87  861 651  651

2   215  55  654  896 514

2  654  51  678  651  215

=> for every id I have several observations, but the number of observations per id varies.

Now I would like to calculate all pairwise differences within every id .

I found out that I could do this using proc transpose to put the dataset in a "wide" format and than use a loop over the transposed variables.

Before I found that solution I wondered if it is possible to do that only with loops and lag-function?

That idea brought me to the following problems:

1. Is it possible to have a macro call from within a data step with a data set value as an argument and execute that code immediately, not after the data step is executed. I know that you can use "call execute" to do a macro call with a data set value as argument, but the problem is that the generated code is only executed after the data step is finished.

2. Is there another solution to the above described problem without transposing the data?

Thanks a lot for your help.

9 REPLIES 9
Keith
Obsidian | Level 7

It would be helpful if you could post what your output dataset should look like, based on the example data you have provided.

UrvishShah
Fluorite | Level 6

Can you please post the sample output you want...In your Question you want pairwise differences within each ID...

Can you please define pairwise differences ???


snowflake
Fluorite | Level 6

I want to calculate all possible differences or to put it in other words compare all possible pairs of two values. As if for every ID I had a matrix (number_observations_for_id, 4) and I run through all rows and all columns and compare all possible pairs.  

LinusH
Tourmaline | Level 20

This is not neat but should work:

  1. Transpose the table so you only have just one value per row
  2. Join the table with itself on id. Since there are multiple rows per id - a M-M join will occur.
  3. Calculate the difference between the values from respective alias table

With this scenario you will also have a value minus itself calculated. To avoid that, ad a row no to the transposed table, and add a table_a.row_no <> table_b.row_no to the join criteria.

Data never sleeps
UrvishShah
Fluorite | Level 6

Try following macro...I am not sure about your requirement...like you said you whould like to compare all possible pairs of two values...It means that you want to have comparision of all the observations...If it is like that than try following code...

%macro test;

    

     %do i = 1 %to 2; /*I take stop value based on your unique ID values*/

         data test;

           set test;

           _diff_var_&i. = dif(var&i.);

         run;

     %end;

%mend;

%test

This will compare just observations for all the variables...

Again it will be good if you can post your output which you want to have...

-Urvish

snowflake
Fluorite | Level 6

Hi Urvish

Thanks for your reply. The problem with your solution ist that it only calculates the difference between the variable and with one lag of it.

What I would like is building all possible pairs within an id:

Id   var1 var2 var3 var4 var5

1    45   66   88   888  99

1    35   36   98   254 345

1   546   5     51   564  22

2   654  87   54    654  651

2   546  87  861   651  651

2   215  55  654   896 514

2  654  51  678   651  215

Id dif1112  dif1113  dif1114       dif1115   dif1121  dif1122 .... 

1 (45-66)   (45-88)   (45-888)    (45-888)  (45-35)   (45-36)  ...      

So for id=1 I have 15*14 differences for all possible combinations.

I know I can do it with proc transpose and loop over the variables. I just wanted to know if there is another way. Especially combing a loop and lag.

Peter_C
Rhodochrosite | Level 12

The problem is more easily solved (at pair level) with a "self join" in PROC SQL

Peter_C
Rhodochrosite | Level 12

tested, here is the pair-wise join in SQL

it creates 500 rows

data starting ;

input

Id   var1 var2 var3 var4 var5;

list;cards;

1    45   66   88   888  99

1    35   36   98   254 345

1   546  5     51   564  .

2   654  87 54    654  651

2   546  87  861 651  651

2   215  55  654  896 514

2  654  51  678  651  215

;

data starting2 ;

set starting ;

  array vars (*) var: ;

  do varn = 1 to dim(vars) ;

     var = vars(varn) ;

     output ;

  end ;

keep id varn var ;

run ;

proc sql ;

create table start2 as

select a.id, a.var as left, b.var as right, a.varn as leftN, b.varN as rightN

from starting2 as a

join starting2 as b

  on a.id   eq b.id

and a.varn ne b.varn

;

quit ;

Astounding
PROC Star

Here's an approach (although I definitely did not dot all the i's or cross all the t's).  It requires that you find out the maximum number of records for a single ID.  For illustration purposes, I'll use 4 as the max.  It also names the variables dif_001 - dif_190.  You could always spell out the names in the proper order (which is not 100% clear and might be the hardest part of the whole task).  Finally, it assumes that if you have taken the equivalent of dif1112, you do not need the same calculation in reverse order which would turn out to be dif1211.  After the long worded introduction, load a single dimensional array.  Then calculating all differences is fairly easy.

data want;

   array values {20} _temporary_;

   array diffs {190} dif_001 - dif_190;

   array vars {5} var1-var5;

   n_rows=0;

   do until (last.id);

      set have;

      by id;

      n_rows + 1;

      do _n_=1 to 5;

         values{5*(n_rows-1) + _n_} = vars{_n_};

      end;

   end;

   _n_=0;

   do _i_=1 to 5*n_rows - 1;

      do _k_ = _i_ + 1 to 5*n_rows;

         _n_ + 1;

         diffs{_n_} = values{_i_} - values{_k_};

      end;

   end;

run;

All of this is untested code.  The naming difficulties might be easier if you were to use a two-dimensional array.  But the first step is to see if the approach makes sense to you.

Good luck.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2187 views
  • 0 likes
  • 6 in conversation