Hi,
I am quite new to sas programming and have not been able to figure out this problem. I am Looking for solution to flag records if a certain criteria is met (in the below example, the criteria is value column < 3) and maintains that throughout the succeeding records. If there is any unmet until the end of the record for that ID then dont flag any. Below is the have and want dataset. The data will be sorted as ID and Day
This is have dataset
ID | day | value |
1 | 1 | 3 |
1 | 2 | 2 |
1 | 3 | 3 |
1 | 4 | 3 |
1 | 5 | 2 |
1 | 6 | 2 |
1 | 7 | 1 |
2 | 1 | 2 |
2 | 2 | 1 |
2 | 3 | 3 |
2 | 4 | 2 |
2 | 5 | 5 |
2 | 6 | 3 |
2 | 7 | 1 |
3 | 1 | 5 |
3 | 2 | 2 |
3 | 3 | 4 |
3 | 4 | 1 |
3 | 5 | 2 |
3 | 6 | 1 |
3 | 7 | 2 |
3 | 8 | 2 |
3 | 9 | 1 |
4 | 1 | 3 |
4 | 2 | 3 |
4 | 3 | 3 |
4 | 4 | 4 |
4 | 5 | 2 |
4 | 6 | 5 |
4 | 7 | 1 |
4 | 8 | 4 |
5 | 1 | 2 |
5 | 2 | 3 |
5 | 3 | 5 |
5 | 4 | 1 |
5 | 5 | 8 |
5 | 6 | 1 |
5 | 7 | 2 |
5 | 8 | 5 |
5 | 9 | 9 |
5 | 10 | 2 |
5 | 11 | 2 |
5 | 12 | 1 |
5 | 13 | 7 |
6 | 1 | 5 |
6 | 2 | 1 |
6 | 3 | 6 |
6 | 4 | 7 |
6 | 5 | 8 |
6 | 6 | 1 |
6 | 7 | 7 |
6 | 8 | 3 |
6 | 9 | 2 |
6 | 10 | 2 |
This is what I am looking to get
ID | day | value | flag |
1 | 1 | 3 | |
1 | 2 | 2 | |
1 | 3 | 3 | |
1 | 4 | 3 | |
1 | 5 | 2 | Y |
1 | 6 | 2 | Y |
1 | 7 | 1 | Y |
2 | 1 | 2 | |
2 | 2 | 1 | |
2 | 3 | 3 | |
2 | 4 | 2 | |
2 | 5 | 5 | |
2 | 6 | 3 | |
2 | 7 | 1 | |
3 | 1 | 5 | |
3 | 2 | 2 | |
3 | 3 | 4 | |
3 | 4 | 1 | Y |
3 | 5 | 2 | Y |
3 | 6 | 1 | Y |
3 | 7 | 2 | Y |
3 | 8 | 2 | Y |
3 | 9 | 1 | Y |
4 | 1 | 3 | |
4 | 2 | 3 | |
4 | 3 | 3 | |
4 | 4 | 4 | |
4 | 5 | 2 | |
4 | 6 | 5 | |
4 | 7 | 1 | |
4 | 8 | 4 | |
5 | 1 | 2 | |
5 | 2 | 3 | |
5 | 3 | 5 | |
5 | 4 | 1 | |
5 | 5 | 8 | |
5 | 6 | 1 | |
5 | 7 | 2 | |
5 | 8 | 5 | |
5 | 9 | 9 | |
5 | 10 | 2 | |
5 | 11 | 2 | |
5 | 12 | 1 | |
5 | 13 | 7 | |
6 | 1 | 5 | |
6 | 2 | 1 | |
6 | 3 | 6 | |
6 | 4 | 7 | |
6 | 5 | 8 | |
6 | 6 | 1 | |
6 | 7 | 7 | |
6 | 8 | 3 | |
6 | 9 | 2 | Y |
6 | 10 | 2 | Y |
As you can see the for ID = 1, day 2 is not flagged because there is >= 3 value after that. I am looking to flag records that meet the criteria of value < 3 and maintains it throughout the last records.
Your rule is much easier to implement if you sort by descending DAY.
proc sort data=have;
by id descending day ;
run;
data want ;
set have ;
by id descending day;
if first.id then flag='Y';
if value > 2 then flag=' ';
retain flag;
run;
Results:
Obs ID day value flag 1 1 7 1 Y 2 1 6 2 Y 3 1 5 2 Y 4 1 4 3 5 1 3 3 6 1 2 2 7 1 1 3 8 2 7 1 Y 9 2 6 3 10 2 5 5 11 2 4 2 12 2 3 3 13 2 2 1 14 2 1 2 15 3 9 1 Y 16 3 8 2 Y 17 3 7 2 Y 18 3 6 1 Y 19 3 5 2 Y 20 3 4 1 Y 21 3 3 4 22 3 2 2 23 3 1 5 24 4 8 4 25 4 7 1 26 4 6 5 27 4 5 2 28 4 4 4 29 4 3 3 30 4 2 3 31 4 1 3 32 5 13 7 33 5 12 1 34 5 11 2 35 5 10 2 36 5 9 9 37 5 8 5 38 5 7 2 39 5 6 1 40 5 5 8 41 5 4 1 42 5 3 5 43 5 2 3 44 5 1 2 45 6 10 2 Y 46 6 9 2 Y 47 6 8 3 48 6 7 7 49 6 6 1 50 6 5 8 51 6 4 7 52 6 3 6 53 6 2 1 54 6 1 5
Thank you for the help. This kind of worked but can you help me or point into a direction how to handle if there is below case. I am only looking to flag if two consecutive cases have less than 3 and once that is met then we will flag all missing as well. As in ID 1, Day 3 is not marked because day 4 has missing. So we start from Day 5. On the other hand in ID=2, we flag starting Day 4 and day 5 meets it so we dont care about missing after that as long as they are below 3
ID | day | value | flag |
1 | 1 | 3 | |
1 | 2 | 2 | |
1 | 3 | 1 | |
1 | 4 | . | |
1 | 5 | 2 | Y |
1 | 6 | 2 | Y |
1 | 7 | 1 | Y |
2 | 1 | 2 | |
2 | 2 | 1 | |
2 | 3 | 3 | |
2 | 4 | 2 | Y |
2 | 5 | 2 | Y |
2 | 6 | . | Y |
2 | 7 | 1 | Y |
Yeah. ID=2 and Day 7 should be flagged. I might have missed when I manually did the want dataset.
data have;
input ID day value;
cards;
1 1 3
1 2 2
1 3 3
1 4 3
1 5 2
1 6 2
1 7 1
2 1 2
2 2 1
2 3 3
2 4 2
2 5 5
2 6 3
2 7 1
3 1 5
3 2 2
3 3 4
3 4 1
3 5 2
3 6 1
3 7 2
3 8 2
3 9 1
4 1 3
4 2 3
4 3 3
4 4 4
4 5 2
4 6 5
4 7 1
4 8 4
5 1 2
5 2 3
5 3 5
5 4 1
5 5 8
5 6 1
5 7 2
5 8 5
5 9 9
5 10 2
5 11 2
5 12 1
5 13 7
6 1 5
6 2 1
6 3 6
6 4 7
6 5 8
6 6 1
6 7 7
6 8 3
6 9 2
6 10 2
;
data temp;
set have;
flag=(value<3);
run;
data want;
do until(last.flag);
set temp;
by id flag notsorted;
end;
if last.id then found=1;
do until(last.flag);
set temp;
by id flag notsorted;
if found and flag then want='Y';output;
end;
drop flag found;
run;
Could you please explain how this works.
I split the code into 2 parts but not able to understand.
Part-1:
data want;
do until(last.flag);
set temp;
by id flag notsorted;
end;
if last.id then found=1;
run;
My observation: Here we get the days 1, 2,4,7 (I took for ID=1)
Part-2:
Data want2;
do until(last.flag);
put _all_;
set temp;
by id flag notsorted;
if found and flag then want='Y';output;
end;
/* drop flag found; */
run;
I'm not able to understand how part2 works?
I prepared an excel as per understanding.
ID | DAY | Value | Flag | Found | Want |
1 | 1 | 3 | 0 | 1 | |
1 | 2 | 2 | 1 | 1 | Y |
1 | 3 | 3 | 0 | ||
1 | 4 | 3 | 0 | 1 | |
1 | 5 | 2 | 1 | ||
1 | 6 | 2 | 1 | ||
1 | 7 | 1 | 1 | 1 | Y |
How come 5,6 got Y. Could you help in understanding the code please
There is no need to sort by ID and descending DAY to generate the flag, only to be required to re-sort by original order (by ID and ascending DAY).
As a logical exercise you would
Here's syntax for a compact way of doing the above:
data want (drop=start_day_under_3) ;
set have (in=firstpass)
have (in=secondpass);
by id;
retain start_day_under_3 ;
if first.id then start_day_under_3=. ;
if firstpass then do;
if value<3 and (first.id=1 or lag(value)>=3) then start_day_under_3=day;
else if value>=3 then start_day_under_3=.;
end;
if secondpass;
if start_day_under_3=min(day,start_day_under_3) then flag='Y';
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.