DATA Step, Macro, Functions and more

Difference between values in two rows

Reply
Frequent Contributor
Posts: 84

Difference between values in two rows

Hi All,

 

I have a dataset with 20 variables and 16 subjects, but each subject has two rows since I have two timepoints (1 and 2) for each subject. I want to add a row for each subject which marked as timepoint 3, but the value is the difference of (timepoint 2 - timepoint 1). for example:

 

subject id  timepoint      lab1      lab2   lab3  lab4    lab5

1                1                   0.5       0.6     15     18       12

1                2                  0.4        0.6      12     20      18

1                3                  -0.1       0.0      -3     2         6

2                1                  0.9        1.3     14      18      21

2                2                  0.3        1.7      19      22     14

2                3                  -0.6        0.4     5        4       -7

the rows marked in red are what I want.

 

any idea?

 

Thanks all,

Chen

Super User
Posts: 23,237

Re: Difference between values in two rows

Output;

Lab1=dif(lab1); lab2=dif(lab2);.......etc;

If timePoint = 2 then do;
TimePoint=3;
Output;
End;

Explicitly output the records. 

Use DIF to calculate the difference. 

Occasional Contributor
Posts: 10

Re: Difference between values in two rows

Can this technique be tweaked to always calculate the dif from the first obs rather than the preceding obs?

Super User
Posts: 23,237

Re: Difference between values in two rows


kryden wrote:

Can this technique be tweaked to always calculate the dif from the first obs rather than the preceding obs?


Which technique?

 

Assuming mine, not quite but there are easier ways for the calculating the difference from the first observation. 

Rather than using the DIF() function you can use the RETAIN function to hold the value across rows. So set it on the first observation or first. record and use that. Untested and probably doesn't account for the first record correctly. 

 

retain first_obs;

if first.id then first_obs = value;

dif = value - first_obs;

Occasional Contributor
Posts: 10

Re: Difference between values in two rows

 

data phys_diff;
	set phys1;
	AGG_PHYS_Mean = dif(AGG_PHYS_Mean);
	AGG_PHYS_StdDev = dif(AGG_PHYS_StdDev);
	AGG_PHYS_Median = dif(AGG_PHYS_Median);
	AGG_PHYS_Q1 = dif(AGG_PHYS_Q1);
	AGG_PHYS_Q3 = dif(AGG_PHYS_Q3);
	AGG_PHYS_Min = dif(AGG_PHYS_Min);
	AGG_PHYS_Max = dif(AGG_PHYS_Max);

run;

Above is what I started with.

 

 

Works great except the new values are the differences from the immediately preceding obs.

How do I adapt what you just wrote to calc the difference from baseline for each var?  Not represented here is that this is done by visit.

Super User
Posts: 23,237

Re: Difference between values in two rows

You replicate the code I have for each variable. You can list multiple in the retain statement but the assignment statements have to happen for each group. I'm assuming you're also do that difference across patients or groups? Ie, you need to determine the baseline for multiple groups? 

 

You should post this as a new question with sample data. 

This is probably what you want, it's a different way but probably just as quick:

https://communities.sas.com/t5/Base-SAS-Programming/Calculate-a-difference-from-quot-baseline-quot-d...

Occasional Contributor
Posts: 10

Re: Difference between values in two rows

Will post as new question.



Thanks for the help!


Occasional Contributor
Posts: 10

Re: Difference between values in two rows

Link to New topic
 
Super User
Super User
Posts: 9,397

Re: Difference between values in two rows

Post test data in the form of a datastep!!

 

As such, this is only theory:

data inter;
  merge have (where=(timepoint=1))
            have (where=(timepoint=2) rename=(lab1=lbx1 lab2=lbx2...));
  by subject_id;
  timepoint=3;
  lab1=lab1-lbx1;
  lab2=lab2-lbx2;
  ...;
run;

data want;
  set have inter;
run;

proc sort data=want;
  by id timepoint;
run;
PROC Star
Posts: 8,145

Re: Difference between values in two rows

Same solution as @Reeza but, to save keystrokes (and reduce the chance of making a typo), I'd include an array:

 

data want;
  input subject_id  timepoint lab1-lab5;
  array labs(*) lab1-lab5;
  Output;

  do i=1 to 5;
    Labs(i)=dif(labs(i));
  end;

  If timePoint = 2 then do;
    TimePoint=3;
    Output;
  end;
  cards;
1  1   0.5    0.6    15     18   12
1  2   0.4    0.6    12     20   18
2  1   0.9    1.3    14     18   21
2  2   0.3    1.7    19     22   14
;

Art, CEO, AnalystFinder.com

 

Ask a Question
Discussion stats
  • 9 replies
  • 997 views
  • 0 likes
  • 5 in conversation