Re: Question about lagging!

accintron · Posted 03-04-2020 12:07 PM

I am STUMPED!

I sorted my data by descending num_readm as I am trying to lag the last prv_unit_id to be the readm_unit per CLIENT_MASTER_ID if applicable.

here's my code:

proc sort data=readm_same_fac;
by CLIENT_MASTER_ID descending NUM_READM ;
run;

data readm_same_fac;
set readm_same_fac;

by CLIENT_MASTER_ID descending NUM_READM;
if first.CLIENT_MASTER_ID and NUM_READM>1 then READM_UNIT=lag(PRV_UNIT_ID);
if NUM_READM=1 then READM_UNIT=.;

run;

The result is the following: (the highlighted rows are pairs where there were 2 readmissions for the patient in desc. order). The prv_unit_id from the last pair is being lagged to the first of the next pair.... How do I fix this to be that for someone who has more than 1 readm, the first prv_unit_id is lagged onto the second record here?

Thanks in advance!!

ballardw · Posted 03-04-2020 12:13 PM

LAG and DIF are queued functions and when used in an IF then the LAG referenced will be the previous time the IF was true.

Generally it is better to create the lagged values before any IF statement and then assign conditionally. Then drop the unwanted variable.

data readm_same_fac;
   set readm_same_fac;
   lpriv = lag(PRV_UNIT_ID);
   by CLIENT_MASTER_ID descending NUM_READM;
   if first.CLIENT_MASTER_ID and NUM_READM>1 then READM_UNIT= lpriv;
   if NUM_READM=1 then READM_UNIT=.;
   drop lpriv;
run;

ed_sas_member · Posted 03-04-2020 12:16 PM

Hi @accintron

You should never use the LAG function in a conditional statement as it underlies that the condition should be true for both current and prior observation.

I suggest that you set the lag(value) first, and then specifies the conditions :

proc sort data=readm_same_fac;
by CLIENT_MASTER_ID descending NUM_READM ;
run;
 
data readm_same_fac;
set readm_same_fac;
by CLIENT_MASTER_ID descending NUM_READM;

_lag = lag(PRV_UNIT_ID);

if first.CLIENT_MASTER_ID and NUM_READM>1 then READM_UNIT=_lag;
if NUM_READM=1 then READM_UNIT=.;

drop _lag;
run;

mkeintz · Posted 03-04-2020 02:41 PM

@ed_sas_member wrote:

Hi @accintron

You should never use the LAG function in a conditional statement as it underlies that the condition should be true for both current and prior observation.

I suggest that you set the lag(value) first, and then specifies the conditions :

I understand the reason for this suggestion, but my response is never-say-never. Consider a dataset with one record per product per month, sorted by month. Let's say the products are not consistently sorted within month.

Say the user wants the monthly difference in same-product sales. In that case an IF statement is exactly what is needed, as in:

if prod='A' then sales_dif=dif(sales); else
if prod='B' then sales_dif=dif(sales);

…

This forms one DIF queue per product, which in turn works only because the LAG or DIF function is used in an IF statement.

This also helps when not all products are reported every month. In that case the user might use this code, which forms one dif queue and one lag queue per product:

if prod='A' then monthly_change=dif(sales)/ intck('month',lag(date),date); else

if prod='B' then monthly_change=dif(sales)/ intck('month',lag(date),date); else

if prod='C' then monthly_change=dif(sales)/ intck('month',lag(date),date);

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

mkeintz · Posted 03-04-2020 02:24 PM

@accintron wrote:

I am STUMPED!

I sorted my data by descending num_readm as I am trying to lag the last prv_unit_id to be the readm_unit per CLIENT_MASTER_ID if applicable.

here's my code:

proc sort data=readm_same_fac;
by CLIENT_MASTER_ID descending NUM_READM ;
run;

data readm_same_fac;
set readm_same_fac;

by CLIENT_MASTER_ID descending NUM_READM;
if first.CLIENT_MASTER_ID and NUM_READM>1 then READM_UNIT=lag(PRV_UNIT_ID);
if NUM_READM=1 then READM_UNIT=.;

run;

You want a way the ALWAYS update the lag queue, but uses its results only conditionally. The INF function is just what you need. The firt argument is the test condition, the 2nd is the result if true, the 3rd is the result if not true. But regardless of whether the test condition is true, BOTH THE SECOND AND THIRD ARGUMENTS ARE EVALUATED. The means, somewhere in the background, the lag queue is ALWAYS updated, even if not always used.

data readm_same_fac;
set readm_same_fac;

by CLIENT_MASTER_ID descending NUM_READM;

readm_unit=ifn(first.client_master_id=1 and num_readm>1,lag(prv_unit_id),.);
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

mkeintz · Posted 03-04-2020 02:32 PM

I've already provided a response in how to fix your program code.

But are you sorting by descending only to get data from the next record by using LAG? And will you be sorting the result in ascending order afterwards in order to get the original sequence? If do, don't both. Instead use data step tools to look ahead - i.e. a simulated LEAD function. In your case, I think it would be like this (untested):


data readm_same_fac (drop=_:);
  set readm_same_fac (keep=client_master_id num_readm);
  by CLIENT_MASTER_ID NUM_READM;
  merge readm_same_fac
        readm_same_fac (firstobs=2 keep=prv_unit_id rename=(prv_unit_id=_nxt_unit_id));

  if last.client_master_id and num_readm>1 then readm_unit=_nxt_unit_id;
  if num_readm=1 then readm_unit=.;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

accintron · Posted 03-04-2020 02:42 PM

So I've tested everything everyone has mentioned with no success.

They've all seemed to do the same thing, and what keeps getting lagged is the prv_unit_id from the previous CLIENT_MASTER_ID, not from the one it should be lagging.

mkeintz · Posted 03-04-2020 02:45 PM

Then please make a SAS data step with the starting sample data, and then another one with the desired results. I'm too lazy to take an image, convert it to ascii data, make a SAS program to read it, and finally test my program against that data. Help us help you.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ballardw · Posted 03-04-2020 06:43 PM

@accintron wrote:

So I've tested everything everyone has mentioned with no success.

They've all seemed to do the same thing, and what keeps getting lagged is the prv_unit_id from the previous CLIENT_MASTER_ID, not from the one it should be lagging.

Another issue is possibly your reuse of the same data set for source an output;

data readm_same_fac;
set readm_same_fac;

Once you have added those incorrect values they will stay there until you remove them because you continue to reuse the modified erroneous data as input.

I suggest that you go back to the step that built your READM_SAME_FAC data set the very first time.

Then modify your code to use a different output data set name to check the results.

AND show us the actual code that you used that did not work if that happens.

This is one of the reasons that we will often suggest not using the

data readm_same_fac;
set readm_same_fac;

code structure until you have a LOT of experience with what can go wrong.

Ready to join fellow brilliant minds for the SAS Hackathon?