BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sasuser31
Calcite | Level 5

I have a dataset that looks like this:

Case_indDateofvisitPersonid
03/20/20161
03/24/20161
14/3/20161
04/3/20161
04/4/20161
04/20/20161
06/20/20171
06/31/20171
07/5/20171
07/9/20171
17/10/20171
07/11/20171
07/12/20171
07/20/20171
01/16/20162
01/16/20162
11/17/20162
01/18/20162
01/25/20162
15/9/20163
09/16/20163
09/18/20163
19/20/20163
09/27/20163

 

It is a dataset that has doctor visits 3 weeks before and 3 weeks after for a person that was diagnosed (case_ind=1) for a condition at the doctor's office. As you can see, this could happen multiple times for one person. I want to group the visits into *visit before diagnosis* and *visit after diagnosis* so I can look into what happened in a health care setting before and after that person was diagnosed.

 

This would presumably look like this:

Case_indDateofvisitPersonidPlace
03/20/201611
03/24/201611
14/3/201610
04/3/201612
04/4/201612
04/20/201612
06/20/201711
06/31/201711
07/5/201711
07/9/201711
17/10/201710
07/11/201712
07/12/201712
07/20/201712
01/16/201621
01/16/201621
11/17/201620
01/18/201622
01/25/201622
15/9/201630
09/16/201631
09/18/201631
19/20/201630
09/27/201632

 

 

I created the following code after messing around for awhile:

data trial_final_data;

set pre_final_data;

by personid ;

if first.personid then place=1;

else if dif(visitdate)>7 then place=1;

if case_ind>0 then place+1;

run;

data trial_final_data1;

set trial_final_data;

if case_ind>0 then place=0;

run;

 

The code almost worked but there are random errors after the first data step. PersonID 3 would have something that looked like this:

15/9/201632
09/16/201632
09/18/201632
19/20/201633
09/27/201633

 

You'll notive that the 2nd observation should be a 1 for the place...but it is a 2. This only happened with like 7 out of 15k people.

 

Does anyone have an idea why this is going on or another approach to get the end result I would like?

 

Thank you so much!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The LAG() and DIF() functions do NOT reference the previous observation. They reference the stack of previous values that are built when the function executes.  Having the DIF() function call only in the ELSE clause of an IF/THEN/ELSE statement is going to cause your stack of previous values to be messed the **** up.

 

I good way to avoid this is save the value into a variable that you can then conditionally use.

data trial_final_data;
  set pre_final_data;
  by personid ;
  dif_visitdate=dif(visitdate);
  if first.personid then place=1;
  else if dif_visitdate >7 then place=1;
  if case_ind>0 then place+1;
run;

But for this problem you can just change the order of your tests so that the DIF() always runs.

Are you sure you don't want the last statement to use an ELSE so that if the first record for a person has CASE_IND> 0 it doesn't start with PLACE=2?

 

data trial_final_data;
  set pre_final_data;
  by personid ;
  if dif(visitdate) >7 then place=1;
  else if first.personid then place=1;
  else if case_ind>0 then place+1;
run;

 

View solution in original post

3 REPLIES 3
Tom
Super User Tom
Super User

The LAG() and DIF() functions do NOT reference the previous observation. They reference the stack of previous values that are built when the function executes.  Having the DIF() function call only in the ELSE clause of an IF/THEN/ELSE statement is going to cause your stack of previous values to be messed the **** up.

 

I good way to avoid this is save the value into a variable that you can then conditionally use.

data trial_final_data;
  set pre_final_data;
  by personid ;
  dif_visitdate=dif(visitdate);
  if first.personid then place=1;
  else if dif_visitdate >7 then place=1;
  if case_ind>0 then place+1;
run;

But for this problem you can just change the order of your tests so that the DIF() always runs.

Are you sure you don't want the last statement to use an ELSE so that if the first record for a person has CASE_IND> 0 it doesn't start with PLACE=2?

 

data trial_final_data;
  set pre_final_data;
  by personid ;
  if dif(visitdate) >7 then place=1;
  else if first.personid then place=1;
  else if case_ind>0 then place+1;
run;

 

sasuser31
Calcite | Level 5

Ahhhh that explains a lot! Thank you so much!

 

Do you know of any resources that could help me understand how SAS processes the lag/dif functions? Thanks!

ballardw
Super User

From the online help for the function:

The queue for each occurrence of LAGn is initialized with n missing values, where n is the length of the queue (for example, a LAG2 queue is initialized with two missing values). When an occurrence of LAGn is executed, the value at the top of its queue is removed and returned, the remaining values are shifted upward, and the new value of the argument is placed at the bottom of the queue. Hence, missing values are returned for the first n executions of each occurrence of LAGn, after which the lagged values of the argument begin to appear.

 

The key part is the "EACH OCCURENCE". When you have an If Thisvar>5 then Value=Lag3(othervar); the Occurence is not every record only the record(s) where ThisVar is > 5.

 

Please see

data example;
   do outer=1 to 5;
      do inner= 3 to 7;
      output;
      end;
   end;
run;

data result;
   set example;
   if inner> 5 then do;
      newval=lag3(outer);
      other = lag2(inner);
   end;
run;

Tracing down why "other" = inner and not the value two before will be educational. It has to do that the stack is only updated when Inner is > 5, so 6 and 7 are the only values the lag2(inner) sees.

 

And the first non-missing value of newval = 1 because the lag stack is 3 deep and the first time the stack has a non-missing value for Lag3 is looking at the outer=1 and inner=6.

You might expect to get a value for inner=4,5,6,7

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1093 views
  • 0 likes
  • 3 in conversation