Hello
I have the following dataset
subject_id | date | volume | flag_1 | flag_2 |
1234 | 1/5/2014 | 23 | 0 | 0 |
1234 | 1/6/2014 | 11 | 1 | 0 |
1234 | 1/7/2014 | 93 | 0 | 1 |
1234 | 1/8/2014 | 12 | 0 | 0 |
1234 | 1/9/2014 | 43 | 0 | 0 |
4432 | 5/23/2015 | 32 | 0 | 0 |
4432 | 5/24/2015 | 21 | 0 | 0 |
4432 | 5/25/2015 | 76 | 1 | 0 |
4432 | 5/26/2015 | 23 | 0 | 1 |
What I would like to do is extract values before flag_2. Something like this:
subject_id | date | volume | flag_1 | flag_2 |
1234 | 1/5/2014 | 23 | 0 | 0 |
1234 | 1/6/2014 | 11 | 1 | 0 |
1234 | 1/7/2014 | 93 | 0 | 1 |
4432 | 5/23/2015 | 32 | 0 | 0 |
4432 | 5/24/2015 | 21 | 0 | 0 |
4432 | 5/25/2015 | 76 | 1 | 0 |
4432 | 5/26/2015 | 23 | 0 | 1 |
I am completely stuck using PROC SQL;
SELECT subjectid, date, volume, flag_1, flag_2
FROM a.dataset
where flag_2 > flag_1;
It looks like you want dates prior to or the same as when flag_2 =1, not a comparison of flag_1 and flag_2.
Is there ever more than one flag_2 = 1 for the same subjected? If so, what is the selection rule then?
What do you want if there are no flag_2=1 for a subject?
This untested code might get you started if there is only one flag_2=1:
proc sql; create table want as SELECT a.subjectid, a.date, a.volume, a.flag_1, a.flag_2 FROM a.dataset as a left join ( select * from a.dataset where flag_2=1) as b on a.subject_id = b.subject_id where a.date le b.date ; quit;
Trying to keep it simple:
proc sql;
select *
from have
group by subject_id
having
date >= min(case when flag_1 then date else . end) and
date <= min(case when flag_2 then date else constant('BIG') end);
quit;
but at the same time, being careful to handle cases when there is no flag_1 = 1 or no flag_2 = 1. Using the fact that everything is > Null and nothing is > constant('BIG')
Can the same ID have more than one observation where FLAG_2=1 ? What should happen?
In any case, I would lean toward a DATA step instead of SQL, as long as your data is already in sorted order:
data want;
set have;
by id;
retain output_flag 'N';
if first.id then output_flag='Y';
if output_flag='Y' then output;
if flag_2=1 then output_flag='N';
drop output_flag;
run;
Processing the records sequentially should be much faster than anything that involves a join.
Hi @radhikaa4 it's fairly straight forward-->
data have;
input subject_id date :mmddyy10. volume flag_1 flag_2 ;
format date mmddyy10.;
cards;
1234 1/5/2014 23 0 0
1234 1/6/2014 11 1 0
1234 1/7/2014 93 0 1
1234 1/8/2014 12 0 0
1234 1/9/2014 43 0 0
4432 5/23/2015 32 0 0
4432 5/24/2015 21 0 0
4432 5/25/2015 76 1 0
4432 5/26/2015 23 0 1
;
proc sql;
create table want as
select *
from have
group by subject_id
having date<= max((flag_2=1)*date)
order by subject_id,date;
quit;
Hello @radhikaa4 I too think you should switch to datastep. If your company is paying for a commercial SAS enterprise license, why not use better approaches right?
data have;
input subject_id date :mmddyy10. volume flag_1 flag_2 ;
format date mmddyy10.;
cards;
1234 1/5/2014 23 0 0
1234 1/6/2014 11 1 0
1234 1/7/2014 93 0 1
1234 1/8/2014 12 0 0
1234 1/9/2014 43 0 0
4432 5/23/2015 32 0 0
4432 5/24/2015 21 0 0
4432 5/25/2015 76 1 0
4432 5/26/2015 23 0 1
;
data want;
set have;
by subject_id date;
if first.subject_id then d=1;
if d<=1 ;
d+flag_2;
drop d;
run;
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.