using dif function and IFN to get difference between each visit

fengyuwuzu · Posted 04-22-2016 03:58 PM

I have a longitudinal data with three columns, ID, visit_time and measure. I want to get the difference between each visit time and measure of each visit.

I have two methods and for each method I have a question:

I was thinking to use LAG or DIF function, like below. Do I need to use retain is this case?

proc sort data=have;
by ID visit_time;
run;

data want;
set have;
by ID;
diff_time=visit_time-lag(visit_time); /* or diff_time=dif(visit_time) */
diff_measure = measure-lag(measure); /*of diff_measure=dif(measure) */
if not first.id then output;
run;

I just learned about the IFN function today, which makes this take easier. Does this code look good?

proc sort data=have;
by ID time;
run;

data want;
set have;
by ID;
diff_time =ifn(first.ID, ., dif(time));
diff_measure=ifn(first.ID, ., dif(measure)); 
run;

ballardw · Posted 04-22-2016 04:12 PM

Test with a small example data set that you know what the results should be.

The LAG and DIF functions have some interesting behaviors when used with "IF Lag(var) then" type statements. I'm not sure but suspect that IFN or IFC may have similar behaviors. Since you know what you want you would have to decide if the code is working for you.

Just from habit I generally work with

LagTime=lag(time);

or

DifTime = Dif(time);

And then use the LagTime or DifTime variables and then usually drop them from the results.

FreelanceReinh · Posted 04-22-2016 06:05 PM

Hi @fengyuwuzu,

This is interesting. I would have had similar concerns as @ballardw about LAG or DIF functions in the second or third argument of IFN, because it is tricky to use them properly in a THEN (or ELSE) clause of an IF-THEN/ELSE statement. (I'm not aware of problems using them in an IF condition, though.)

However, it turns out that those IFN function arguments are evaluated regardless of whether the condition in the first argument is true or false. Therefore, your second approach yields correct results. Unlike your first data step, it reproduces all observations from dataset HAVE, including IDs with only a single observation (which might be an advantage).

Using dif(x) rather than x-lag(x) avoids undesirable notes about "Missing values were generated ..." in the log from the first call of lag(x).

PGStats · Posted 04-22-2016 11:08 PM

When using LAG and DIF, it pays to keep the logic as simple as possible. I kinda like:

proc sort data=have;
by ID time;
run;

data want;
set have;
by ID;
diff_time = dif(time);
diff_measure = dif(measure);
if first.ID then call missing(of diff_:); 
run;

PG

using dif function and IFN to get difference between each visit

Re: using dif function and IFN to get difference between each visit

Re: using dif function and IFN to get difference between each visit

Re: using dif function and IFN to get difference between each visit

using dif function and IFN to get difference between each visit

Re: using dif function and IFN to get difference between each visit

Re: using dif function and IFN to get difference between each visit

Re: using dif function and IFN to get difference between each visit

Click image to register for webinar

Classroom Training Available!