Hi All,
I have a data with multiple IDs with multiple dates. The example below contains one ID. I have column C which is YTD, it has data inconsistencies. I would like to fix column C into column E and then calculate column F based on column E.
Column D is what i calculated. I need help to arrive to the solution as column F.
Note: the code contain logic not to subtract from the first observation and from month one since it is YTD.
data want;
set have;
by id;
if first.id then rev_curr_want=rev_ytd_have;
else if month(date)=1 then rev_curr_want=rev_ytd_have;
else rev_curr_want=rev_ytd_have-lag(rev_ytd_have);
run;
A | B | C | D | E | F |
date | id | rev_ytd_have | rev_curr_calc | rev_ytd_want | rev_curr_want |
201911 | adf10 | 10 | 10 | 10 | 10 |
201912 | adf10 | 10 | 0 | 10 | 0 |
202001 | adf10 | 13 | 3 | 13 | 13 |
202002 | adf10 | 13 | 0 | 13 | 0 |
202003 | adf10 | 13 | 0 | 13 | 0 |
202004 | adf10 | 13 | 0 | 13 | 0 |
202005 | adf10 | -13 | 13 | 0 | |
202006 | adf10 | 0 | 13 | 0 | |
202007 | adf10 | 14 | 14 | 14 | 1 |
202008 | adf10 | 14 | 0 | 14 | 0 |
202009 | adf10 | 14 | 0 | 14 | 0 |
202010 | adf10 | 14 | 0 | 14 | 0 |
202011 | adf10 | 14 | 0 | 14 | 0 |
202012 | adf10 | 14 | 0 | 14 | 0 |
202101 | adf10 | 16 | 2 | 16 | 16 |
202102 | adf10 | 16 | 0 | 16 | 0 |
202103 | adf10 | -16 | 0 | 0 |
Lag, and the companion function Dif, is a queued function. That means that when you use it in an IF/else block the result the Lag function returns is the last time the If/else was true.
I am not at all sure why a YTD involves subtraction but perhaps (not tested as too lazy to convert that table into a data step to test code):
data want; set have; by id; lryh = lag(rev_ytd_have); if first.id then rev_curr_want=rev_ytd_have; else if month(date)=1 then rev_curr_want=rev_ytd_have; else rev_curr_want=rev_ytd_have - lryh; run;
Lag, and the companion function Dif, is a queued function. That means that when you use it in an IF/else block the result the Lag function returns is the last time the If/else was true.
I am not at all sure why a YTD involves subtraction but perhaps (not tested as too lazy to convert that table into a data step to test code):
data want; set have; by id; lryh = lag(rev_ytd_have); if first.id then rev_curr_want=rev_ytd_have; else if month(date)=1 then rev_curr_want=rev_ytd_have; else rev_curr_want=rev_ytd_have - lryh; run;
Because, as @ballardw said, the LAG and DIF functions are queue managers, not lookbacks, putting them as a THEN assignment in an IF statement will not produce the "conditional lookback" that you want.
You have to update the LAG (or DIF below) with every observation, but use the DIF result only conditionally. Embedding the LAG or DIF (or any function) as an argument of the IFN function will always do the update, even if that update will not be returned by the IFN. (Same thing is true when using a LAG inside an IFC function that returns a character value).
As a result, this should work:
data want;
set have;
by id;
rev_curr_want=ifn(first.id=1 or month(date)=1,rev_ytd_have,dif(rev_ytd_have));
run;
Untested, in the absence of sample data in the form of a working SAS data step.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.