I'm getting a single instance of duplication in a DATA step that I cannot figure out.
Input data:
PatID | FromDate | ThruDate | ProvID |
1000 | 3/3/2017 | 3/3/2017 | 222 |
1000 | 3/6/2017 | 3/6/2017 | 222 |
1000 | 3/7/2017 | 3/7/2017 | 222 |
1000 | 3/8/2017 | 3/8/2017 | 222 |
1000 | 3/9/2017 | 3/9/2017 | 222 |
1000 | 3/10/2017 | 3/10/2017 | 222 |
1000 | 5/8/2017 | 5/8/2017 | 222 |
1000 | 5/9/2017 | 5/9/2017 | 222 |
1000 | 5/10/2017 | 5/10/2017 | 222 |
1000 | 5/11/2017 | 5/11/2017 | 222 |
1000 | 5/12/2017 | 5/12/2017 | 222 |
1000 | 5/15/2017 | 5/15/2017 | 222 |
1000 | 6/27/2017 | 6/27/2017 | 999 |
Code:
DATA STEP2;
KEEP PATID FromDate_NEW ThruDate ProvID;
RENAME FromDate_NEW = FromDate;
SET STEP1;
BY PATID ProvID;
MERGE STEP1
STEP1 (FIRSTOBS=2 KEEP=FromDate RENAME=(FromDate=_NEXT_FROM));
RETAIN FromDate_NEW ThruDate_LAST;
FORMAT FromDate_NEW MMDDYY10.
ThruDate_LAST MMDDYY10.;
IF FIRST.PATID or first.ProvID THEN DO;
ThruDate_LAST = .;
FromDate_NEW=FromDate;
END;
ELSE DO;
IF FromDate - ThruDate_LAST > 2 THEN FromDate_NEW = FromDate;
END;
ThruDate_LAST=ThruDate;
IF LAST.PATID or last.ProvID THEN OUTPUT;
if LAST.PATID=0 AND _next_from - ThruDate > 2 THEN OUTPUT;
RUN;
Output:
PatID | ThruDate | ProvID | FromDate |
1000 | 3/3/2017 | 222 | 3/3/2017 |
1000 | 3/10/2017 | 222 | 3/6/2017 |
1000 | 5/12/2017 | 222 | 5/8/2017 |
1000 | 5/15/2017 | 222 | 5/15/2017 |
1000 | 5/15/2017 | 222 | 5/15/2017 |
1000 | 6/27/2017 | 999 | 6/27/2017 |
Desired output:
PatID | ThruDate | ProvID | FromDate |
1000 | 3/3/2017 | 222 | 3/3/2017 |
1000 | 3/10/2017 | 222 | 3/6/2017 |
1000 | 5/12/2017 | 222 | 5/8/2017 |
1000 | 5/15/2017 | 222 | 5/15/2017 |
1000 | 6/27/2017 | 999 | 6/27/2017 |
Any idea why I'm getting duplication when FromDate = 5/15/17 and ProvID = 222?
Thanks!
I can tell you the cause, but only you can figure out the solution. You're the one who knows what the program is supposed to do.
You have two OUTPUT statements. and you have an observation (the one that repeats an extra time) that meets the IF conditions for both OUTPUT statements. So it gets output twice. It's up to you to change the logic about when to output. It might be as simple as adding the word ELSE after the first OUTPUT statement, or it might be more complex than that.
I guess because you have a set and a merge statement. Since your code is extremely hard to read because of all the shouting and no visual formatting, it's just a guess.
I can tell you the cause, but only you can figure out the solution. You're the one who knows what the program is supposed to do.
You have two OUTPUT statements. and you have an observation (the one that repeats an extra time) that meets the IF conditions for both OUTPUT statements. So it gets output twice. It's up to you to change the logic about when to output. It might be as simple as adding the word ELSE after the first OUTPUT statement, or it might be more complex than that.
@rlafond wrote:
This was helpful, thanks. I thought the vector cleared and the code started over after the first output. ELSE solved it.
OUTPUT is executable. Which means each time the instruction is encountered it executes.
This is intentional behavior as you can specify a specific output put data set as an argument to the statement (assuming the data set name appears on the DATA statement) to direct observations as needed.
Consider as an example:
data work.boys work.girls work.older; set sashelp.class; if sex='F' then output work.girls; if sex='M' then output work.boys; if age ge 14 then output work.older; run;
I will not say that splitting the data is the best option but it is one.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.