BookmarkSubscribeRSS Feed
fengyuwuzu
Pyrite | Level 9

I have a longitudinal data with three columns, ID, visit_time and measure. I want to get the difference between each  visit time and measure of each visit.

 

I have two methods and for each method I have a question:

 

I was thinking to use  LAG or DIF function, like below. Do I need to use retain is this case?

 

proc sort data=have;
by ID visit_time;
run;

data want;
set have;
by ID;
diff_time=visit_time-lag(visit_time); /* or diff_time=dif(visit_time) */
diff_measure = measure-lag(measure); /*of diff_measure=dif(measure) */ if not first.id then output; run;

I just learned about the IFN function today, which makes this take easier. Does this code look good?

 

proc sort data=have;
by ID time;
run;

data want;
set have;
by ID;
diff_time =ifn(first.ID, ., dif(time));
diff_measure=ifn(first.ID, ., dif(measure)); 
run;
3 REPLIES 3
ballardw
Super User

Test with a small example data set that you know what the results should be.

 

The LAG and DIF functions have some interesting behaviors when used with "IF Lag(var) then" type statements. I'm not sure but suspect that IFN or IFC may have similar behaviors. Since you know what you want you would have to decide if the code is working for you.

 

Just from habit I generally work with

 

LagTime=lag(time);

or

DifTime = Dif(time);

And then use the LagTime or DifTime variables and then usually drop them from the results.

FreelanceReinh
Jade | Level 19

Hi @fengyuwuzu,

 

This is interesting. I would have had similar concerns as @ballardw about LAG or DIF functions in the second or third argument of IFN, because it is tricky to use them properly in a THEN (or ELSE) clause of an IF-THEN/ELSE statement. (I'm not aware of problems using them in an IF condition, though.)

 

However, it turns out that those IFN function arguments are evaluated regardless of whether the condition in the first argument is true or false. Therefore, your second approach yields correct results. Unlike your first data step, it reproduces all observations from dataset HAVE, including IDs with only a single observation (which might be an advantage).

 

Using dif(x) rather than x-lag(x) avoids undesirable notes about "Missing values were generated ..." in the log from the first call of lag(x).

 

PGStats
Opal | Level 21

When using LAG and DIF, it pays to keep the logic as simple as possible. I kinda like:

 

proc sort data=have;
by ID time;
run;

data want;
set have;
by ID;
diff_time = dif(time);
diff_measure = dif(measure);
if first.ID then call missing(of diff_:); 
run;
PG

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 2146 views
  • 4 likes
  • 4 in conversation