Hello
I have the following dataset
| subject_id | date | volume | flag_1 | flag_2 |
| 1234 | 1/5/2014 | 23 | 0 | 0 |
| 1234 | 1/6/2014 | 11 | 1 | 0 |
| 1234 | 1/7/2014 | 93 | 0 | 1 |
| 1234 | 1/8/2014 | 12 | 0 | 0 |
| 1234 | 1/9/2014 | 43 | 0 | 0 |
| 4432 | 5/23/2015 | 32 | 0 | 0 |
| 4432 | 5/24/2015 | 21 | 0 | 0 |
| 4432 | 5/25/2015 | 76 | 1 | 0 |
| 4432 | 5/26/2015 | 23 | 0 | 1 |
What I would like to do is extract values before flag_2. Something like this:
| subject_id | date | volume | flag_1 | flag_2 |
| 1234 | 1/5/2014 | 23 | 0 | 0 |
| 1234 | 1/6/2014 | 11 | 1 | 0 |
| 1234 | 1/7/2014 | 93 | 0 | 1 |
| 4432 | 5/23/2015 | 32 | 0 | 0 |
| 4432 | 5/24/2015 | 21 | 0 | 0 |
| 4432 | 5/25/2015 | 76 | 1 | 0 |
| 4432 | 5/26/2015 | 23 | 0 | 1 |
I am completely stuck using PROC SQL;
SELECT subjectid, date, volume, flag_1, flag_2
FROM a.dataset
where flag_2 > flag_1;
It looks like you want dates prior to or the same as when flag_2 =1, not a comparison of flag_1 and flag_2.
Is there ever more than one flag_2 = 1 for the same subjected? If so, what is the selection rule then?
What do you want if there are no flag_2=1 for a subject?
This untested code might get you started if there is only one flag_2=1:
proc sql;
create table want as
SELECT a.subjectid, a.date, a.volume, a.flag_1, a.flag_2
FROM a.dataset as a
left join
( select * from a.dataset where
flag_2=1) as b
on a.subject_id = b.subject_id
where a.date le b.date
;
quit;
Trying to keep it simple:
proc sql;
select *
from have
group by subject_id
having
date >= min(case when flag_1 then date else . end) and
date <= min(case when flag_2 then date else constant('BIG') end);
quit;
but at the same time, being careful to handle cases when there is no flag_1 = 1 or no flag_2 = 1. Using the fact that everything is > Null and nothing is > constant('BIG')
Can the same ID have more than one observation where FLAG_2=1 ? What should happen?
In any case, I would lean toward a DATA step instead of SQL, as long as your data is already in sorted order:
data want;
set have;
by id;
retain output_flag 'N';
if first.id then output_flag='Y';
if output_flag='Y' then output;
if flag_2=1 then output_flag='N';
drop output_flag;
run;
Processing the records sequentially should be much faster than anything that involves a join.
Hi @radhikaa4 it's fairly straight forward-->
data have;
input subject_id date :mmddyy10. volume flag_1 flag_2 ;
format date mmddyy10.;
cards;
1234 1/5/2014 23 0 0
1234 1/6/2014 11 1 0
1234 1/7/2014 93 0 1
1234 1/8/2014 12 0 0
1234 1/9/2014 43 0 0
4432 5/23/2015 32 0 0
4432 5/24/2015 21 0 0
4432 5/25/2015 76 1 0
4432 5/26/2015 23 0 1
;
proc sql;
create table want as
select *
from have
group by subject_id
having date<= max((flag_2=1)*date)
order by subject_id,date;
quit;
Hello @radhikaa4 I too think you should switch to datastep. If your company is paying for a commercial SAS enterprise license, why not use better approaches right?
data have;
input subject_id date :mmddyy10. volume flag_1 flag_2 ;
format date mmddyy10.;
cards;
1234 1/5/2014 23 0 0
1234 1/6/2014 11 1 0
1234 1/7/2014 93 0 1
1234 1/8/2014 12 0 0
1234 1/9/2014 43 0 0
4432 5/23/2015 32 0 0
4432 5/24/2015 21 0 0
4432 5/25/2015 76 1 0
4432 5/26/2015 23 0 1
;
data want;
set have;
by subject_id date;
if first.subject_id then d=1;
if d<=1 ;
d+flag_2;
drop d;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.