I have a longitudinal data with three columns, ID, visit_time and measure. I want to get the difference between each visit time and measure of each visit.
I have two methods and for each method I have a question:
I was thinking to use LAG or DIF function, like below. Do I need to use retain is this case?
proc sort data=have;
by ID visit_time;
run;
data want;
set have;
by ID;
diff_time=visit_time-lag(visit_time); /* or diff_time=dif(visit_time) */
diff_measure = measure-lag(measure); /*of diff_measure=dif(measure) */
if not first.id then output;
run;
I just learned about the IFN function today, which makes this take easier. Does this code look good?
proc sort data=have;
by ID time;
run;
data want;
set have;
by ID;
diff_time =ifn(first.ID, ., dif(time));
diff_measure=ifn(first.ID, ., dif(measure));
run;
Test with a small example data set that you know what the results should be.
The LAG and DIF functions have some interesting behaviors when used with "IF Lag(var) then" type statements. I'm not sure but suspect that IFN or IFC may have similar behaviors. Since you know what you want you would have to decide if the code is working for you.
Just from habit I generally work with
LagTime=lag(time);
or
DifTime = Dif(time);
And then use the LagTime or DifTime variables and then usually drop them from the results.
Hi @fengyuwuzu,
This is interesting. I would have had similar concerns as @ballardw about LAG or DIF functions in the second or third argument of IFN, because it is tricky to use them properly in a THEN (or ELSE) clause of an IF-THEN/ELSE statement. (I'm not aware of problems using them in an IF condition, though.)
However, it turns out that those IFN function arguments are evaluated regardless of whether the condition in the first argument is true or false. Therefore, your second approach yields correct results. Unlike your first data step, it reproduces all observations from dataset HAVE, including IDs with only a single observation (which might be an advantage).
Using dif(x) rather than x-lag(x) avoids undesirable notes about "Missing values were generated ..." in the log from the first call of lag(x).
When using LAG and DIF, it pays to keep the logic as simple as possible. I kinda like:
proc sort data=have;
by ID time;
run;
data want;
set have;
by ID;
diff_time = dif(time);
diff_measure = dif(measure);
if first.ID then call missing(of diff_:);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.