BookmarkSubscribeRSS Feed
huntingchart
Calcite | Level 5

Hi, 

I am quite new to sas programming and have not been able to figure out this problem. I am Looking for solution to flag records if a certain criteria is met (in the below example, the criteria is value column < 3) and maintains that throughout the succeeding records. If there is any unmet until the end of the record for that ID then dont flag any. Below is the have and want dataset. The data will be sorted as ID and Day

This is have dataset 

IDdayvalue
113
122
133
143
152
162
171
212
221
233
242
255
263
271
315
322
334
341
352
361
372
382
391
413
423
433
444
452
465
471
484
512
523
535
541
558
561
572
585
599
5102
5112
5121
5137
615
621
636
647
658
661
677
683
692
6102

 

This is what I am looking to get 

IDdayvalueflag
113 
122 
133 
143 
152Y
162Y
171Y
212 
221 
233 
242 
255 
263 
271 
315 
322 
334 
341Y
352Y
361Y
372Y
382Y
391Y
413 
423 
433 
444 
452 
465 
471 
484 
512 
523 
535 
541 
558 
561 
572 
585 
599 
5102 
5112 
5121 
5137 
615 
621 
636 
647 
658 
661 
677 
683 
692Y
6102Y

 

As you can see the for ID = 1, day 2 is not flagged because there is >= 3 value after that. I am looking to flag records that meet the criteria of value < 3 and maintains it throughout the last records. 

7 REPLIES 7
Tom
Super User Tom
Super User

Your rule is much easier to implement if you sort by descending DAY.

proc sort data=have;
  by id descending day ;
run;

data want ;
  set have ;
  by id descending day;
  if first.id then flag='Y';
  if value > 2 then flag=' ';
  retain flag;
run;

Results:

Obs    ID    day    value    flag

  1     1      7      1       Y
  2     1      6      2       Y
  3     1      5      2       Y
  4     1      4      3
  5     1      3      3
  6     1      2      2
  7     1      1      3

  8     2      7      1       Y
  9     2      6      3
 10     2      5      5
 11     2      4      2
 12     2      3      3
 13     2      2      1
 14     2      1      2

 15     3      9      1       Y
 16     3      8      2       Y
 17     3      7      2       Y
 18     3      6      1       Y
 19     3      5      2       Y
 20     3      4      1       Y
 21     3      3      4
 22     3      2      2
 23     3      1      5

 24     4      8      4
 25     4      7      1
 26     4      6      5
 27     4      5      2
 28     4      4      4
 29     4      3      3
 30     4      2      3
 31     4      1      3

 32     5     13      7
 33     5     12      1
 34     5     11      2
 35     5     10      2
 36     5      9      9
 37     5      8      5
 38     5      7      2
 39     5      6      1
 40     5      5      8
 41     5      4      1
 42     5      3      5
 43     5      2      3
 44     5      1      2

 45     6     10      2       Y
 46     6      9      2       Y
 47     6      8      3
 48     6      7      7
 49     6      6      1
 50     6      5      8
 51     6      4      7
 52     6      3      6
 53     6      2      1
 54     6      1      5
huntingchart
Calcite | Level 5

Thank you for the help. This kind of worked but can you help me or point into a direction how to handle if there is below case. I am only looking to flag if two consecutive cases have less than 3 and once that is met then we will flag all missing as well. As in ID 1, Day 3 is not marked because day 4 has missing. So we start from Day 5. On the other hand in ID=2, we flag starting Day 4 and day 5 meets it so we dont care about missing after that as long as they are below 3

 

IDdayvalueflag
113 
122 
131 
14. 
152Y
162Y
171Y
212 
221 
233 
242Y
252Y
26.Y
271Y

 

huntingchart
Calcite | Level 5

Yeah. ID=2 and Day 7 should be flagged. I might have missed when I manually did the want dataset. 

Ksharp
Super User
data have;
input ID day value;
cards;
1 1 3
1 2 2
1 3 3
1 4 3
1 5 2
1 6 2
1 7 1
2 1 2
2 2 1
2 3 3
2 4 2
2 5 5
2 6 3
2 7 1
3 1 5
3 2 2
3 3 4
3 4 1
3 5 2
3 6 1
3 7 2
3 8 2
3 9 1
4 1 3
4 2 3
4 3 3
4 4 4
4 5 2
4 6 5
4 7 1
4 8 4
5 1 2
5 2 3
5 3 5
5 4 1
5 5 8
5 6 1
5 7 2
5 8 5
5 9 9
5 10 2
5 11 2
5 12 1
5 13 7
6 1 5
6 2 1
6 3 6
6 4 7
6 5 8
6 6 1
6 7 7
6 8 3
6 9 2
6 10 2
;

data temp;
 set have;
 flag=(value<3);
run;

data want;
 do until(last.flag);
  set temp;
  by id flag notsorted;
 end;
 if last.id then found=1;
 do until(last.flag);
  set temp;
  by id flag notsorted;
  if found and flag then want='Y';output;
 end;
 drop flag found;
run;
samanvi
Obsidian | Level 7

Could you please explain how this works.

I split the code into 2 parts but not able to understand.

Part-1:

data want;
do until(last.flag);
set temp;
by id flag notsorted;
end;
if last.id then found=1;
run;

My observation: Here we get the days 1, 2,4,7 (I took for ID=1) 

Part-2:

Data want2;
do until(last.flag);
put _all_;
set temp;
by id flag notsorted;
if found and flag then want='Y';output;
end;
/* drop flag found; */
run;

I'm not able to understand how part2 works?

I prepared an excel as per understanding.

IDDAYValueFlagFoundWant
11301 
12211Y
1330  
14301 
1521  
1621  
17111Y

How come 5,6 got Y.  Could you help in understanding the code please

mkeintz
PROC Star

There is no need to sort by ID and descending DAY to generate the flag, only to be required to re-sort by original order (by ID and ascending DAY).

 

As a logical exercise you would 

  1. Read each ID twice.
  2. On the first pass
    1. If you encounter a value of 2 or less that is either  the beginning of an ID or follows a value greater than 2 then record that START_DAY_UNDER_3.
    2. But if the current value is greater than 2 then reset START_DAY_UNDER_3 to missing
  3. On the second pass,
    1. for every DAY on or after START_DAY_UNDER_3  then set flag='Y'.
    2. output the record.

 

Here's syntax for a compact way of doing the above:

 

data want (drop=start_day_under_3) ;
  set have (in=firstpass)
      have (in=secondpass);
  by id;
  retain start_day_under_3  ;
  if first.id then start_day_under_3=. ;

  if firstpass then do;
    if value<3 and (first.id=1 or lag(value)>=3) then start_day_under_3=day;
    else if value>=3 then start_day_under_3=.;
  end;

  if secondpass;
  if start_day_under_3=min(day,start_day_under_3) then flag='Y';
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2551 views
  • 2 likes
  • 6 in conversation