Solved: Re: Urgent help needed for this Tricky Logic based SAs

ShilpaMadhu · Posted 04-03-2020 05:49 PM

Hi All, I have been fighting with this data file..

Problem : There are Visits and Medication number associated with PatientID. For the missing medication, there will be reCalc flag of1. So, what I need to do here is whenever there is 1 I will have to identify the just previous visit(only) for the particular PatientID and need to find how many medications were given in the previous visit(It could be 1 or 2 and N number of medications given during the visits as per the plan). Once found this as well, need to add those medications in the current visit which is flagged as 1.

PatientID	Visit	Medication	reCalc
1	Visit1	111111
1	Visit2	222222
1	Visit2	33333
1	Visit3		1
1	Visit4	888888
…….
12	Visit1	34343
12	Visit2	12345
12	Visit2	67891
12	Visit2	54321
12	Visit4		1
12	Visit5	43333
12	Visit6		1

below is the expected output,

PatientID	Visit	Medication	reCalc
1	Visit1	111111
1	Visit2	222222
1	Visit2	33333
1	Visit3	222222
1	Visit3	33333	1
1	Visit4	888888
…….
12	Visit1	34343
12	Visit2	12345
12	Visit2	67891
12	Visit2	54321
12	Visit4	12345
12	Visit4	67891
12	Visit4	54321	1
12	Visit5	43333
12	Visit6	43333	1

Please please look into this.

mkeintz · Posted 04-06-2020 12:11 AM

So now the business rule needs to be expanded to accommodate one or more visit records with no medications falling between a complete visit and the recalc-visit. So using the LAG function based on the event of first.visit=1 no longer is the rule. It would have to be based on a more complex rule. And using _N_-1 as the end of the prior visit is no longer accurate, meaning a separate conditional lag would be needed.

In this case, I'd drop the use of lag and just set _prior_visit_start and _prior_visit_end variables whenever there is a real visit in hand with real medications:

data have;
infile cards truncover;
input PatientID	Visit $	Medication	reCalc;
cards;
1 Visit1 111111 
1 Visit2 222222 
1 Visit2 33333 
1 Visit3  . 1
1 Visit4 888888 
12 Visit1 34343 
12 Visit2 12345 
12 Visit2 67891 
12 Visit2 54321 
12 Visit4  . 1
12 Visit5 43333 
12 Visit6  . 1
13  Visit1  454545
13  Visit1  778788
13  Visit2  . 
13 Visit3 . 
13  Visit4 .
13  Visit5  .
13  Visit6 .
13  Visit7  . 1
14 Visit1 545450
14  Visit1 3434454
14 Visit2  . 1
14  Visit3 . 1
;
data want (drop=_:);
  set have;
  by patientid visit;

  retain _prior_visit_start  _prior_visit_end;
  if first.visit and medication^=. then _prior_visit_start=_n_;
  if last.visit and medication^=. then _prior_visit_end=_n_;

  if recalc^=1 then output;
  else do ptr=_prior_visit_start to _prior_visit_end;
    set have (keep=medication) point=ptr;
    output;
  end;
run;

Remember, this program assumes that each ID has at least one "real" visit sometime before any visits with recalc=1. If that were not the case, then a recalc record from one id could end up with visit data from the preceding id.

Editted explanatory note:

How this works:

The program reads the incoming data, expecting it to be sorted by patientid/visit (the BY statement). When the first record for a visit is found (first.visit=1) which is also a record with a valid medication (medication^=.) then you are at the start of a visit whose data may be needed later on. So for that record save the record's position in the entire dataset (_prior_visit_start=_N_). Now _N_ is not always the actual position in the dataset, but it is in this simple data step. Do the same for the last record for any such visit (_prior_visit_end=_N_).

The two variables used to record the record position are "retained", so SAS will not default to automatically resetting them to missing values in each iteration of the data step. Which makes them available to guide the re-reading of data when needed.

Now if the incoming record doesn't have recalc=1, there is no need to fetch data from a prior visit, so just output.

But if recalc=1 then re-read each record ("do ptr=_prior_visit_start to _prior_visit_end") in the most recent visit that had actual medications, and re-output each of them. This is done by using another SET statement with the "point=ptr" option. But also note this extra SET statement only reads the medication variable ("(keep=medication)"), so it doesn't overwrite the current visit number with the prior visit number, .. or any other variables that might be in the data.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

novinosrin · Posted 04-03-2020 06:04 PM


data have;
infile cards truncover;
input PatientID	Visit $	Medication	reCalc;
cards;
1	Visit1	111111	 
1	Visit2	222222	 
1	Visit2	33333	 
1	Visit3	 .	1
1	Visit4	888888	 
12	Visit1	34343	 
12	Visit2	12345	 
12	Visit2	67891	 
12	Visit2	54321	 
12	Visit4	 .	1
12	Visit5	43333	 
12	Visit6	 .	1
;


data want;
 set have;
 by patientid visit;
 retain t;
 if last.visit and Medication then t=Medication;
 if recalc then Medication=t;
run;

mkeintz · Posted 04-03-2020 07:02 PM

If the current visit is a single record with recalc=1, you want to repeat the entire medication sequence for the preceding visit by the same patient. So if the preceding visit had 3 medication records, you want to replicate them with the current visit.

This is one of those times, when the mistakenly-abhorred use of a lag function in the IF statement is just the ticket:

data have;
infile cards truncover;
input PatientID	Visit $	Medication	reCalc;
cards;
1	Visit1	111111	 
1	Visit2	222222	 
1	Visit2	33333	 
1	Visit3	 .	1
1	Visit4	888888	 
12	Visit1	34343	 
12	Visit2	12345	 
12	Visit2	67891	 
12	Visit2	54321	 
12	Visit4	 .	1
12	Visit5	43333	 
12	Visit6	 .	1
;
data want (drop=_:);
  set have;
  by patientid visit;

  retain _prior_visit_start;
  if first.visit then _prior_visit_start=lag(_n_);

  if recalc^=1 then output;
  else do ptr=_prior_visit_start to _n_-1;
    set have (keep=medication) point=ptr;
    output;
  end;
run;

Assumptions:

data sorted by patientid/visit
no patient has a first record with recalc=1
any record with recalc=1 is the only record for that visit.

Back to the subject of a lag function inside an IF statement. What you want is to know where in dataset HAVE is the beginning of the prior visit. The "if first.visit" test says to do an action only at the start of the visit. And that action is the lag function, which really should be thought of as an "update queue" function. So instead of looking back one observation, the lag here is looking back one "instance" - namely the prior time first.visit=1. Thus one keeps track of the observation number starting the prior visit.

And of course, the end of the prior visit is simply _N_-1.

Also whenever recalc=1 just re-read only the medication sequence for the prior group outputting once per medication. This is done via the "point=" option on the set statement. Otherwise, if recalc^=1 simply output.

The only way this deviates from your sample want, is that it has recalc=1 in every one of the generated records for the recalc-visit, not just the last one. That would not be hard to fix, but would detract from the central point.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ShilpaMadhu · Posted 04-04-2020 07:40 AM

Thanks a lot for all your effort. I added your code snippet in my code.

We are almost there,,,, this code is taking most part of my datafile Except that, the below scenario has to be covered,

data have;
infile cards truncover;
input PatientID Visit $ Medication reCalc;
cards;
1 Visit1 111111
1 Visit2 222222
1 Visit2 33333
1 Visit3 . 1
1 Visit4 888888
12 Visit1 34343
12 Visit2 .
12 Visit2 .
12 Visit2 .
12 Visit4 . 1
12 Visit5 43333
12 Visit6 . 1
;

Please try this. When there are many null values, it is nit considering. Please help me with this as well.

mkeintz · Posted 04-04-2020 03:32 PM

I have told you what I understand the task to be - as a specific "business rule".

You have shown additional input data that my understanding does not address. That's find, but you have NOT shown what you want the output to look like. I am not a good mind reader. Please show (1) what the program produced, (2) what you want it to produce, and (3) a description of the actual business rule you are attempting to implement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ShilpaMadhu · Posted 04-05-2020 05:44 PM

I appreciate you for your sincere effort on this tricky problem. The solution you have provided was covering my code for Patient 1 and 12. But not for 13 and 14. I have added the scenario here for reference with the desire output. Please have a look.

data have;
infile cards truncover;
input PatientID Visit $ Medication reCalc;
cards;
1 Visit1 111111
1 Visit2 222222
1 Visit2 33333
1 Visit3 . 1
1 Visit4 888888
12 Visit1 34343
12 Visit2 12345
12 Visit2 67891
12 Visit2 54321
12 Visit4 . 1
12 Visit5 43333
12 Visit6 . 1
13 Visit1 454545
13 Visit1 778788
13 Visit2 .
13 Visit3 .
13 Visit4 .
13 Visit5 .
13 Visit6 .
13 Visit7 . 1
14 Visit1 545450
14 Visit1 3434454
14 Visit2 . 1
14 Visit3 . 1

;
Run;

Desire Output :

PatientID	Visit	Medication	reCalc
1	Visit1	111111
1	Visit2	222222
1	Visit2	33333
1	Visit3	222222	1
1	Visit3	33333	1
1	Visit4	888888
12	Visit1	34343
12	Visit2	12345
12 12	Visit2	67891
12	Visit2	54321
12	Visit4	12345	1
12	Visit4	67891	1
12	Visit4	54321	1
12 12	Visit5	43333
12	Visit6	43333	1
13	Visit1	454545
13	Visit1	778788
13	Visit2
13	Visit3
13	Visit4
13	Visit5
13	Visit6
13	Visit7	454545	1
13	Visit7	778788	1
14	Visit1	545450
14	Visit1	3434454
14	Visit2	545450	1
14	Visit2	3434454
14	Visit3	3434454	1
14	Visit2	3434454

ShilpaMadhu · Posted 04-05-2020 05:48 PM

Adding to above, Please read the last 4 records as below, There is a typo for this patient.

14 Visit2 545450 1
14 Visit2 3434454 1
14 Visit3 3434454 1
14 Visit3 3434454 1

ShilpaMadhu · Posted 04-05-2020 05:50 PM

Again,, there was a typo. Please execuse me. Please read the last 4 records as below,

14 Visit2 545450 1
14 Visit2 3434454 1
14 Visit3 545450 1
14 Visit3 3434454 1

mkeintz · Posted 04-06-2020 12:11 AM

So now the business rule needs to be expanded to accommodate one or more visit records with no medications falling between a complete visit and the recalc-visit. So using the LAG function based on the event of first.visit=1 no longer is the rule. It would have to be based on a more complex rule. And using _N_-1 as the end of the prior visit is no longer accurate, meaning a separate conditional lag would be needed.

In this case, I'd drop the use of lag and just set _prior_visit_start and _prior_visit_end variables whenever there is a real visit in hand with real medications:

data have;
infile cards truncover;
input PatientID	Visit $	Medication	reCalc;
cards;
1 Visit1 111111 
1 Visit2 222222 
1 Visit2 33333 
1 Visit3  . 1
1 Visit4 888888 
12 Visit1 34343 
12 Visit2 12345 
12 Visit2 67891 
12 Visit2 54321 
12 Visit4  . 1
12 Visit5 43333 
12 Visit6  . 1
13  Visit1  454545
13  Visit1  778788
13  Visit2  . 
13 Visit3 . 
13  Visit4 .
13  Visit5  .
13  Visit6 .
13  Visit7  . 1
14 Visit1 545450
14  Visit1 3434454
14 Visit2  . 1
14  Visit3 . 1
;
data want (drop=_:);
  set have;
  by patientid visit;

  retain _prior_visit_start  _prior_visit_end;
  if first.visit and medication^=. then _prior_visit_start=_n_;
  if last.visit and medication^=. then _prior_visit_end=_n_;

  if recalc^=1 then output;
  else do ptr=_prior_visit_start to _prior_visit_end;
    set have (keep=medication) point=ptr;
    output;
  end;
run;

Remember, this program assumes that each ID has at least one "real" visit sometime before any visits with recalc=1. If that were not the case, then a recalc record from one id could end up with visit data from the preceding id.

Editted explanatory note:

How this works:

The program reads the incoming data, expecting it to be sorted by patientid/visit (the BY statement). When the first record for a visit is found (first.visit=1) which is also a record with a valid medication (medication^=.) then you are at the start of a visit whose data may be needed later on. So for that record save the record's position in the entire dataset (_prior_visit_start=_N_). Now _N_ is not always the actual position in the dataset, but it is in this simple data step. Do the same for the last record for any such visit (_prior_visit_end=_N_).

The two variables used to record the record position are "retained", so SAS will not default to automatically resetting them to missing values in each iteration of the data step. Which makes them available to guide the re-reading of data when needed.

Now if the incoming record doesn't have recalc=1, there is no need to fetch data from a prior visit, so just output.

But if recalc=1 then re-read each record ("do ptr=_prior_visit_start to _prior_visit_end") in the most recent visit that had actual medications, and re-output each of them. This is done by using another SET statement with the "point=ptr" option. But also note this extra SET statement only reads the medication variable ("(keep=medication)"), so it doesn't overwrite the current visit number with the prior visit number, .. or any other variables that might be in the data.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ShilpaMadhu · Posted 04-06-2020 06:59 AM

Perfect!!!

The code you shared works in 100 percent perfection.

And I had accepted this as Solution.

Could you please help me understand how does this code work. I just tried your code blindly. And worked. But, did not understand the logicality of it. I am really not as techy as you.

Kudos to you technical skill.

Registration is open

SAS Training: Just a Click Away