Not necessary earliest because the "flag" variable is important as well.
@Emma2021 wrote:
Not necessary earliest because the "flag" variable is important as well.
I have to interpret your statement " If not missing then get the row that associated with smallest date's row " as referring to the Flag variable. You do not provide any example/description or use for what to do with a missing flag variable value.
Since none of the Subjid=1 with missing Flag values were included why is the Subjid=2 with a missing flag value included in the output.
You really have not very clearly described the exact roll the value of the Flag variable plays. Providing example output without clear rules is an incomplete description. I can provide programming that would create the exact example that you show for the given input but it very likely would not work in a general sense because the the actual process or rules have not been provided that indicate why the record with
2 6jul2021 .
is wanted in the output but the records like
1 1jul2021 .
are not wanted.
Thank you, but it is wrong because the 5 is date difference (LAG) should be used not _n_.
Why is it wrong?
B.
Combination of date and flag should be used:
1. Check 5 days lag
2. If lag is within 5 days then take the first row:
But here can be flag missing or not missing,
If all missing, then just take the first date
If some not missing, then take the not missing first flag
If all not missing flag, then take the first date
When as below data:
data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1
2 1jul2021 1
2 6jul2021 .
;
run;
Wanted should be as below:
1 4jul2021 1
1 10jul2021 2
2 1jul2021 1
2 6jul2021 .
But when as below data:
data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 4jul2022 1
1 10jul2022 4
1 16jul2022 2
1 18jul2022 1
2 1jul2021 1
2 6jul2021 .
;
run;
Then it should be as below:
1 1jul2021 .
1 4jul2022 1
2 1jul2021 1
2 6jul2021 .
Still not explaining the rules clearly.
@Emma2021 wrote:
When as below data:
data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1
2 1jul2021 1
2 6jul2021 .
;
run;
Wanted should be as below:
1 4jul2021 1
1 10jul2021 2
2 1jul2021 1
2 6jul2021 .
But when as below data:
data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 4jul2022 1
1 10jul2022 4
1 16jul2022 2
1 18jul2022 1
2 1jul2021 1
2 6jul2021 .
;
run;
Then it should be as below:
1 1jul2021 .
1 4jul2022 1
2 1jul2021 1
2 6jul2021 .
Combination of date and flag should be used:
1. Check 5 days lag
2. If lag is within 5 days then take the first row:
But here can be flag missing or not missing,
If all missing, then just take the first date
If some not missing, then take the not missing first flag
If all not missing flag, then take the first date…
I must be missing something because the logic you describe and the have/want samples you give us don't match - at least not for how I read things.
Below result using the logic as I understand it. Is this what you're after? If not can you please explain in detail what would need to be different - and please in other words and not just re-posting what you've provided already.
Btw: A "flag" has normally only two values - True/False, Y/N, 1/0.... Your variable "flag" is clearly not a flag.
data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1
2 1jul2021 .
2 6jul2021 .
;
/* data inter:
- id rows with non missing flag OR
- all rows for id's where flag is always missing
*/
data inter;
if _n_=1 then
do;
dcl hash h1(dataset:'have(where=(not missing(flag)))');
h1.defineKey('id');
h1.defineDone();
end;
set have;
if not missing(flag) or h1.check() ne 0;
run;
data want;
set inter;
by id date;
dt_dif=dif(date);
if first.id or dt_dif>=5 then output;
drop dt_dif;
run;
proc print data=want;;
run;
@Emma2021 wrote:
Your code is working but in below example it does not working:
data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 18jul2021 1
1 4jul2022 1
1 10jul2022 4
1 16jul2022 2
2 1jul2021 .
2 6jul2021 .
;
Then the wanted data should be below:
1 1jul2021 .
1 18jul2021 1
1 4jul2022 1
1 10jul2022 4
1 16jul2022 2
2 1jul2021 .
2 6jul2021 .
?? - Then please re-formulate the logic very clearly.
a: You want the date calculation based on the original sort order of your data and not sorted by date within an ID? or something else.
b: Why do you select 1Jul2021 given your logic states "If some not missing, then take the not missing first flag". Does this logic only apply for the same date within an id? That needs more clarification.
Please also try and spend the time to create sample data that covers all the possible cases. It's not very helpful to get these "your code is working but it's not working" and then you post some modified sample data.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.