Solved: Re: function dif(x) and last.observation

DmytroYermak · Posted 07-28-2017 09:40 AM

Hi all,

Could you please help with the following issue.

We have dataset with variables x,y,z for subjects 1, 2, 3, etc.

We need to add difference - dif(x) between last.observation and 'observation before last'. When simple dif(x) is working it is impossible to apply it for 'last.observation' and 'observation before last'. Does a way exist to sort it out?

Please see the code below and output.

data multiple;
	infile datalines;
	input subject 1-2 X 4 Y 6 Z 8;
datalines;
01 1 2 3
01 4 5 6
01 7 8 9
01 8 9 5
02 8 7 6
02 5 4 3
02 2 1 0
03 8 7 9
03 7 5 4
;
run;

proc sort data=multiple;
		by subject;
run;

data one;
	set multiple;
	by subject;
		MX=dif(x);
			if last.subject then do; LX=dif(x); LY=dif(y); LZ=dif(z); end;
run;

proc print data=one; run;

ballardw · Posted 07-28-2017 11:37 AM

the Dif and Lag functions maintain separate queues of values. So when used inside an IF the queue contains the last time the condition was true, not the previous record.

Note that your result for row 9 is the different with the previous LAST subject 2. And Subject 1 had no output because there was no previous "last subject".

View solution in original post

Astounding · Posted 07-28-2017 10:00 AM

In general, the way you approach this is to calculate on every observation, then reset values to missing. For example:

LX = Mx;

LY = dif(y);

LZ = dif(z);

if last.subject=0 then do;

lx = .;

ly = .;

lz = .;

end;

Also note that you might want to re-set MX:

MX = dif(x);

if first.subject then mx=.;

DmytroYermak · Posted 07-28-2017 10:34 AM

Thank you. It seems below is what I need.

data one (drop= MX MY MZ);
	set multiple;
	by subject;
		MX=dif(x);MY=dif(y);MZ=dif(z);
		if last.subject=0 then do; LX=.; LY=.; LY=.; end;
				else do; LX=MX; LY=MY; LZ=MZ; output; end;
run;

ballardw · Posted 07-28-2017 11:37 AM

the Dif and Lag functions maintain separate queues of values. So when used inside an IF the queue contains the last time the condition was true, not the previous record.

Note that your result for row 9 is the different with the previous LAST subject 2. And Subject 1 had no output because there was no previous "last subject".

Astounding · Posted 07-28-2017 12:25 PM

If you are planning on outputting just the last observation for each SUBJECT (as in your latest program), you can use much less:

data one;

set multiple;

by subject;

LX = dif(x);

LY = dif(y);

LZ = dif(x);

if last.subject;

run;

DmytroYermak · Posted 07-28-2017 12:31 PM

Thank you. Here it is the task:

And here it is my solution:

data one (drop= SumX X SumY Y SumZ Z);
	set multiple;
	by subject;
		difX3_2=dif(x);difY3_2=dif(y);difZ3_2=dif(z);
		difX3_1=dif2(x);difY3_1=dif2(y);difZ3_1=dif2(z);
		SumX+X;SumY+Y;SumZ+Z;
	if last.subject then do; MeanX=SumX/3;MeanY=SumY/3;MeanZ=SumZ/3; SumX=0; SumY=0; SumZ=0; output; end;
run;

Astounding · Posted 07-28-2017 12:45 PM

Given that (a) you need to perform additional calculations, and (b) you need all the results on a single observation, I think you need to switch gears:

data want;

set have;

by subject;

retain x1 x2 y1 y2 z1 z2;

if first.subject then do;

x1 = x;

y1 = y;

z1 = 2;

end;

else if last.subject=0 then do;

x2 = x;

y2 = y;

z2 = z;

end;

if last.subject;

*** Now you have all 9 values on a single observation. Perform the final calculations in whatever way you would prefer;

run;

DmytroYermak · Posted 07-28-2017 03:50 PM

@Astounding wrote:
... I think you need to switch gears:

*** Now you have all 9 values on a single observation. Perform the final calculations in whatever way you would prefer;
run;

I think I got your idea...

Actually it was a task to use LAG, DIF, MEAN. Sorry for not mentioning it.

Registration is open

SAS Training: Just a Click Away