Hi all,
Could you please help with the following issue.
We have dataset with variables x,y,z for subjects 1, 2, 3, etc.
We need to add difference - dif(x) between last.observation and 'observation before last'. When simple dif(x) is working it is impossible to apply it for 'last.observation' and 'observation before last'. Does a way exist to sort it out?
Please see the code below and output.
data multiple;
infile datalines;
input subject 1-2 X 4 Y 6 Z 8;
datalines;
01 1 2 3
01 4 5 6
01 7 8 9
01 8 9 5
02 8 7 6
02 5 4 3
02 2 1 0
03 8 7 9
03 7 5 4
;
run;
proc sort data=multiple;
by subject;
run;
data one;
set multiple;
by subject;
MX=dif(x);
if last.subject then do; LX=dif(x); LY=dif(y); LZ=dif(z); end;
run;
proc print data=one; run;
the Dif and Lag functions maintain separate queues of values. So when used inside an IF the queue contains the last time the condition was true, not the previous record.
Note that your result for row 9 is the different with the previous LAST subject 2. And Subject 1 had no output because there was no previous "last subject".
In general, the way you approach this is to calculate on every observation, then reset values to missing. For example:
LX = Mx;
LY = dif(y);
LZ = dif(z);
if last.subject=0 then do;
lx = .;
ly = .;
lz = .;
end;
Also note that you might want to re-set MX:
MX = dif(x);
if first.subject then mx=.;
Thank you. It seems below is what I need.
data one (drop= MX MY MZ); set multiple; by subject; MX=dif(x);MY=dif(y);MZ=dif(z); if last.subject=0 then do; LX=.; LY=.; LY=.; end; else do; LX=MX; LY=MY; LZ=MZ; output; end; run;
the Dif and Lag functions maintain separate queues of values. So when used inside an IF the queue contains the last time the condition was true, not the previous record.
Note that your result for row 9 is the different with the previous LAST subject 2. And Subject 1 had no output because there was no previous "last subject".
If you are planning on outputting just the last observation for each SUBJECT (as in your latest program), you can use much less:
data one;
set multiple;
by subject;
LX = dif(x);
LY = dif(y);
LZ = dif(x);
if last.subject;
run;
Thank you. Here it is the task:
And here it is my solution:
data one (drop= SumX X SumY Y SumZ Z);
set multiple;
by subject;
difX3_2=dif(x);difY3_2=dif(y);difZ3_2=dif(z);
difX3_1=dif2(x);difY3_1=dif2(y);difZ3_1=dif2(z);
SumX+X;SumY+Y;SumZ+Z;
if last.subject then do; MeanX=SumX/3;MeanY=SumY/3;MeanZ=SumZ/3; SumX=0; SumY=0; SumZ=0; output; end;
run;
Given that (a) you need to perform additional calculations, and (b) you need all the results on a single observation, I think you need to switch gears:
data want;
set have;
by subject;
retain x1 x2 y1 y2 z1 z2;
if first.subject then do;
x1 = x;
y1 = y;
z1 = 2;
end;
else if last.subject=0 then do;
x2 = x;
y2 = y;
z2 = z;
end;
if last.subject;
*** Now you have all 9 values on a single observation. Perform the final calculations in whatever way you would prefer;
run;
@Astounding wrote:... I think you need to switch gears:
*** Now you have all 9 values on a single observation. Perform the final calculations in whatever way you would prefer;
run;
I think I got your idea...
Actually it was a task to use LAG, DIF, MEAN. Sorry for not mentioning it.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.