BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ShilpaMadhu
Fluorite | Level 6

Hi All, I have been fighting with this data file..

 

Problem : There are Visits and Medication number associated with PatientID. For the missing medication, there will be reCalc flag of1. So, what I need to do here is whenever there is 1 I will have to identify the just previous visit(only) for the particular PatientID and need to find how many medications were given in the previous visit(It could be 1 or 2 and N number of medications given during the visits as per the plan). Once found this as well, need to add those medications in the current visit which is flagged as 1.

 

PatientIDVisitMedicationreCalc
1Visit1111111 
1Visit2222222 
1Visit233333 
1Visit3 1
1Visit4888888 
…….   
12Visit134343 
12Visit212345 
12Visit267891 
12Visit254321 
12Visit4 1
12Visit543333 
12Visit6 1

 

below is the expected output,

 

PatientIDVisitMedicationreCalc
1Visit1111111 
1Visit2222222 
1Visit233333 
1Visit3222222 
1Visit3333331
1Visit4888888 
…….   
12Visit134343 
12Visit212345 
12Visit267891 
12Visit254321 
12Visit412345 
12Visit467891 
12Visit4543211
12Visit543333 
12Visit6433331
 

Please please look into this.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

So now the business rule needs to be expanded to accommodate one or more visit records with no medications  falling between a complete visit and the recalc-visit.  So using the LAG function based on the event of first.visit=1 no longer is the rule.  It would have to be based on a more complex rule. And using _N_-1 as the end of the prior visit is no longer accurate, meaning a separate conditional lag would be needed.

 

In this case, I'd drop the use of lag and just set _prior_visit_start and _prior_visit_end variables whenever there is a real visit in hand with real medications:

 

data have;
infile cards truncover;
input PatientID	Visit $	Medication	reCalc;
cards;
1 Visit1 111111 
1 Visit2 222222 
1 Visit2 33333 
1 Visit3  . 1
1 Visit4 888888 
12 Visit1 34343 
12 Visit2 12345 
12 Visit2 67891 
12 Visit2 54321 
12 Visit4  . 1
12 Visit5 43333 
12 Visit6  . 1
13  Visit1  454545
13  Visit1  778788
13  Visit2  . 
13 Visit3 . 
13  Visit4 .
13  Visit5  .
13  Visit6 .
13  Visit7  . 1
14 Visit1 545450
14  Visit1 3434454
14 Visit2  . 1
14  Visit3 . 1
;
data want (drop=_:);
  set have;
  by patientid visit;

  retain _prior_visit_start  _prior_visit_end;
  if first.visit and medication^=. then _prior_visit_start=_n_;
  if last.visit and medication^=. then _prior_visit_end=_n_;

  if recalc^=1 then output;
  else do ptr=_prior_visit_start to _prior_visit_end;
    set have (keep=medication) point=ptr;
    output;
  end;
run;

 

Remember, this program assumes that each ID has at least one "real" visit sometime before any visits with recalc=1.    If that were not the case, then a recalc record from one id could end up with visit data from the preceding id.   

 

Editted explanatory note:

 

How this works:

The program reads the incoming data, expecting it to be sorted by patientid/visit (the BY statement).  When the first record for a visit is found (first.visit=1) which is also a record with a valid medication  (medication^=.) then you are at the start of a visit whose data may be needed later on.  So for that record save the record's position in the entire dataset (_prior_visit_start=_N_).  Now _N_ is not always the actual position in the dataset, but it is in this simple data step.   Do the same for the last record for any such visit (_prior_visit_end=_N_).

 

The two variables used to record the record position are "retained", so SAS will not default to automatically resetting them to missing values in each iteration of the data step.  Which makes them available to guide the re-reading of data when needed.

 

Now if the incoming record doesn't have recalc=1, there is no need to fetch data from a prior visit, so just output.

 

But if recalc=1 then re-read each record ("do ptr=_prior_visit_start to _prior_visit_end") in the most recent visit that had actual medications, and re-output each of them.  This is done by using another SET statement with the "point=ptr" option.  But also note this extra SET statement only reads the medication variable  ("(keep=medication)"), so it doesn't overwrite the current visit number with the prior visit number, .. or any other variables that might be in the data.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

9 REPLIES 9
novinosrin
Tourmaline | Level 20

data have;
infile cards truncover;
input PatientID	Visit $	Medication	reCalc;
cards;
1	Visit1	111111	 
1	Visit2	222222	 
1	Visit2	33333	 
1	Visit3	 .	1
1	Visit4	888888	 
12	Visit1	34343	 
12	Visit2	12345	 
12	Visit2	67891	 
12	Visit2	54321	 
12	Visit4	 .	1
12	Visit5	43333	 
12	Visit6	 .	1
;


data want;
 set have;
 by patientid visit;
 retain t;
 if last.visit and Medication then t=Medication;
 if recalc then Medication=t;
run;
mkeintz
PROC Star

If the current visit is a single record with recalc=1, you want to repeat the entire medication sequence for the preceding visit by the same patient.  So if the preceding visit had 3 medication records, you want to replicate them with the current visit.

 

This is one of those times, when the mistakenly-abhorred use of a lag function in the IF statement is just the ticket:

 

data have;
infile cards truncover;
input PatientID	Visit $	Medication	reCalc;
cards;
1	Visit1	111111	 
1	Visit2	222222	 
1	Visit2	33333	 
1	Visit3	 .	1
1	Visit4	888888	 
12	Visit1	34343	 
12	Visit2	12345	 
12	Visit2	67891	 
12	Visit2	54321	 
12	Visit4	 .	1
12	Visit5	43333	 
12	Visit6	 .	1
;
data want (drop=_:);
  set have;
  by patientid visit;

  retain _prior_visit_start;
  if first.visit then _prior_visit_start=lag(_n_);

  if recalc^=1 then output;
  else do ptr=_prior_visit_start to _n_-1;
    set have (keep=medication) point=ptr;
    output;
  end;
run;

Assumptions:

  1. data sorted by patientid/visit
  2. no patient has a first record with recalc=1
  3. any record with recalc=1 is the only record for that visit.

Back to the subject of a lag function inside an IF statement.  What you want is to know where in dataset HAVE is the beginning of the prior visit.   The "if first.visit" test says to do an action only at the start of the visit.  And that action is the lag function, which really should be thought of as an "update queue" function.  So instead of looking back one observation, the lag here is looking back one "instance" - namely the prior time first.visit=1.   Thus one keeps track of the observation number starting the prior visit. 

 

And of course, the end of the prior visit is simply _N_-1.

 

Also whenever recalc=1 just re-read only the medication sequence for the prior group outputting once per medication.  This is done via the "point=" option on the set statement.  Otherwise, if recalc^=1 simply output.

 

The only way this deviates from your sample want, is that it has recalc=1 in every one of the generated records for the recalc-visit, not just the last one.  That would not be hard to fix, but would detract from the central point.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ShilpaMadhu
Fluorite | Level 6

Thanks a lot for all your effort.  I added your code snippet in my code. 

We are almost there,,,, this code is taking most part of my datafile Except that, the below scenario has to be covered,

 

data have;
infile cards truncover;
input PatientID Visit $ Medication reCalc;
cards;
1 Visit1 111111
1 Visit2 222222
1 Visit2 33333
1 Visit3 . 1
1 Visit4 888888
12 Visit1 34343
12 Visit2  .
12 Visit2  .
12 Visit2 .
12 Visit4 . 1
12 Visit5 43333
12 Visit6 . 1
;

 

Please try this.  When there are many null values, it is nit considering. Please help me with this as well.

 

mkeintz
PROC Star

I have told you what I understand the task to be - as a specific "business rule". 

 

You have shown additional input data that my understanding does not address.  That's find, but you have NOT shown what you want the output to look like.  I am not a good mind reader.  Please show (1) what the program produced, (2) what you want it to produce, and (3) a description of the actual business rule you are attempting to implement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ShilpaMadhu
Fluorite | Level 6

I appreciate you for your sincere effort on this tricky problem. The solution you have provided was covering my code for Patient 1 and 12. But not for 13 and 14. I have added the scenario here for reference with the desire output. Please have a look.

 

data have;
infile cards truncover;
input PatientID Visit $ Medication reCalc;
cards;
1 Visit1 111111 
1 Visit2 222222 
1 Visit2 33333 
1 Visit3  . 1
1 Visit4 888888 
12 Visit1 34343 
12 Visit2 12345 
12 Visit2 67891 
12 Visit2 54321 
12 Visit4  . 1
12 Visit5 43333 
12 Visit6  . 1
13  Visit1  454545
13  Visit1  778788
13  Visit2  . 
13 Visit3 . 
13  Visit4 .
13  Visit5  .
13  Visit6 .
13  Visit7  . 1
14 Visit1 545450
14  Visit1 3434454
14 Visit2  . 1
14  Visit3 . 1

;
Run;

 

Desire Output :

 

PatientID VisitMedication reCalc

Visit1111111  
1Visit2222222  
1Visit2 33333  
1Visit3222222 1
1Visit333333 1
1Visit4888888  
12  Visit134343  
12Visit2 12345  
12  
12 
Visit267891  
12Visit2 54321  
12Visit4 12345 1
12Visit4 67891 1
12Visit4 54321 1
12  
12
Visit543333  
12Visit6 43333 1
13  Visit1 454545 
13Visit1 778788 
13 Visit2   
13Visit3  
13Visit4  
13Visit5  
13Visit6  
13Visit74545451
13Visit77787881
14 Visit1 545450 
14 Visit13434454 
14 Visit2  5454501
14 Visit2  3434454 
14Visit3 34344541
14 Visit2  3434454 

 

 

ShilpaMadhu
Fluorite | Level 6
Adding to above, Please read the last 4 records as below, There is a typo for this patient.

14 Visit2 545450 1
14 Visit2 3434454 1
14 Visit3 3434454 1
14 Visit3 3434454 1
ShilpaMadhu
Fluorite | Level 6
Again,, there was a typo. Please execuse me. Please read the last 4 records as below,

14 Visit2 545450 1
14 Visit2 3434454 1
14 Visit3 545450 1
14 Visit3 3434454 1
mkeintz
PROC Star

So now the business rule needs to be expanded to accommodate one or more visit records with no medications  falling between a complete visit and the recalc-visit.  So using the LAG function based on the event of first.visit=1 no longer is the rule.  It would have to be based on a more complex rule. And using _N_-1 as the end of the prior visit is no longer accurate, meaning a separate conditional lag would be needed.

 

In this case, I'd drop the use of lag and just set _prior_visit_start and _prior_visit_end variables whenever there is a real visit in hand with real medications:

 

data have;
infile cards truncover;
input PatientID	Visit $	Medication	reCalc;
cards;
1 Visit1 111111 
1 Visit2 222222 
1 Visit2 33333 
1 Visit3  . 1
1 Visit4 888888 
12 Visit1 34343 
12 Visit2 12345 
12 Visit2 67891 
12 Visit2 54321 
12 Visit4  . 1
12 Visit5 43333 
12 Visit6  . 1
13  Visit1  454545
13  Visit1  778788
13  Visit2  . 
13 Visit3 . 
13  Visit4 .
13  Visit5  .
13  Visit6 .
13  Visit7  . 1
14 Visit1 545450
14  Visit1 3434454
14 Visit2  . 1
14  Visit3 . 1
;
data want (drop=_:);
  set have;
  by patientid visit;

  retain _prior_visit_start  _prior_visit_end;
  if first.visit and medication^=. then _prior_visit_start=_n_;
  if last.visit and medication^=. then _prior_visit_end=_n_;

  if recalc^=1 then output;
  else do ptr=_prior_visit_start to _prior_visit_end;
    set have (keep=medication) point=ptr;
    output;
  end;
run;

 

Remember, this program assumes that each ID has at least one "real" visit sometime before any visits with recalc=1.    If that were not the case, then a recalc record from one id could end up with visit data from the preceding id.   

 

Editted explanatory note:

 

How this works:

The program reads the incoming data, expecting it to be sorted by patientid/visit (the BY statement).  When the first record for a visit is found (first.visit=1) which is also a record with a valid medication  (medication^=.) then you are at the start of a visit whose data may be needed later on.  So for that record save the record's position in the entire dataset (_prior_visit_start=_N_).  Now _N_ is not always the actual position in the dataset, but it is in this simple data step.   Do the same for the last record for any such visit (_prior_visit_end=_N_).

 

The two variables used to record the record position are "retained", so SAS will not default to automatically resetting them to missing values in each iteration of the data step.  Which makes them available to guide the re-reading of data when needed.

 

Now if the incoming record doesn't have recalc=1, there is no need to fetch data from a prior visit, so just output.

 

But if recalc=1 then re-read each record ("do ptr=_prior_visit_start to _prior_visit_end") in the most recent visit that had actual medications, and re-output each of them.  This is done by using another SET statement with the "point=ptr" option.  But also note this extra SET statement only reads the medication variable  ("(keep=medication)"), so it doesn't overwrite the current visit number with the prior visit number, .. or any other variables that might be in the data.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ShilpaMadhu
Fluorite | Level 6

Perfect!!!

 

The code you shared works in 100 percent perfection.

 And I had accepted this as Solution.

 

Could you please help me understand how does this code work. I just tried your code blindly. And worked. But, did not understand the logicality of it. I am really not as techy as you.

 

Kudos to you technical skill.

 

 

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1889 views
  • 0 likes
  • 3 in conversation