03-15-2017 11:13 AM
I have a dataset with 20 variables and 16 subjects, but each subject has two rows since I have two timepoints (1 and 2) for each subject. I want to add a row for each subject which marked as timepoint 3, but the value is the difference of (timepoint 2 - timepoint 1). for example:
subject id timepoint lab1 lab2 lab3 lab4 lab5
1 1 0.5 0.6 15 18 12
1 2 0.4 0.6 12 20 18
1 3 -0.1 0.0 -3 2 6
2 1 0.9 1.3 14 18 21
2 2 0.3 1.7 19 22 14
2 3 -0.6 0.4 5 4 -7
the rows marked in red are what I want.
03-15-2017 11:17 AM
Output; Lab1=dif(lab1); lab2=dif(lab2);.......etc; If timePoint = 2 then do; TimePoint=3; Output; End;
Explicitly output the records.
Use DIF to calculate the difference.
02-24-2018 12:07 AM
Can this technique be tweaked to always calculate the dif from the first obs rather than the preceding obs?
Assuming mine, not quite but there are easier ways for the calculating the difference from the first observation.
Rather than using the DIF() function you can use the RETAIN function to hold the value across rows. So set it on the first observation or first. record and use that. Untested and probably doesn't account for the first record correctly.
retain first_obs; if first.id then first_obs = value; dif = value - first_obs;
02-24-2018 12:12 AM
data phys_diff; set phys1; AGG_PHYS_Mean = dif(AGG_PHYS_Mean); AGG_PHYS_StdDev = dif(AGG_PHYS_StdDev); AGG_PHYS_Median = dif(AGG_PHYS_Median); AGG_PHYS_Q1 = dif(AGG_PHYS_Q1); AGG_PHYS_Q3 = dif(AGG_PHYS_Q3); AGG_PHYS_Min = dif(AGG_PHYS_Min); AGG_PHYS_Max = dif(AGG_PHYS_Max); run;
Above is what I started with.
Works great except the new values are the differences from the immediately preceding obs.
How do I adapt what you just wrote to calc the difference from baseline for each var? Not represented here is that this is done by visit.
02-24-2018 12:28 AM
You replicate the code I have for each variable. You can list multiple in the retain statement but the assignment statements have to happen for each group. I'm assuming you're also do that difference across patients or groups? Ie, you need to determine the baseline for multiple groups?
You should post this as a new question with sample data.
This is probably what you want, it's a different way but probably just as quick:
03-15-2017 11:24 AM
Post test data in the form of a datastep!!
As such, this is only theory:
data inter; merge have (where=(timepoint=1)) have (where=(timepoint=2) rename=(lab1=lbx1 lab2=lbx2...)); by subject_id; timepoint=3; lab1=lab1-lbx1; lab2=lab2-lbx2; ...; run; data want; set have inter; run; proc sort data=want; by id timepoint; run;
03-15-2017 12:21 PM
Same solution as @Reeza but, to save keystrokes (and reduce the chance of making a typo), I'd include an array:
data want; input subject_id timepoint lab1-lab5; array labs(*) lab1-lab5; Output; do i=1 to 5; Labs(i)=dif(labs(i)); end; If timePoint = 2 then do; TimePoint=3; Output; end; cards; 1 1 0.5 0.6 15 18 12 1 2 0.4 0.6 12 20 18 2 1 0.9 1.3 14 18 21 2 2 0.3 1.7 19 22 14 ;
Art, CEO, AnalystFinder.com