BookmarkSubscribeRSS Feed
radhikaa4
Calcite | Level 5

Hello

 

I have the following dataset

 

subject_iddatevolumeflag_1flag_2
12341/5/20142300
12341/6/20141110
12341/7/20149301
12341/8/20141200
12341/9/20144300
44325/23/20153200
44325/24/20152100
44325/25/20157610
44325/26/20152301

 

 

What I would like to do is extract values before flag_2. Something like this:

 

subject_iddatevolumeflag_1flag_2
12341/5/20142300
12341/6/20141110
12341/7/20149301
44325/23/20153200
44325/24/20152100
44325/25/20157610
44325/26/20152301

 

I am completely stuck using PROC SQL;

 

SELECT subjectid, date, volume, flag_1, flag_2

FROM a.dataset

where flag_2 > flag_1;

 

6 REPLIES 6
ballardw
Super User

It looks like you want dates prior to or the same as when flag_2 =1, not a comparison of flag_1 and flag_2.

 

Is there ever more than one flag_2 = 1 for the same subjected? If so, what is the selection rule then?

What do you want if there are no flag_2=1 for a subject?

 

This untested code might get you started if there is only one flag_2=1:

proc sql;
   create table want as
   SELECT a.subjectid, a.date, a.volume, a.flag_1, a.flag_2
   FROM a.dataset as a
        left join 
        ( select * from a.dataset where 
          flag_2=1) as b
        on a.subject_id = b.subject_id
   where a.date le b.date
   ;
quit;
radhikaa4
Calcite | Level 5
Hi this works!



But now I ran into another problem (which I did not realize)



So:


subject_id

date

volume

flag_1

flag_2

1234

1/5/2014

23

0

0

1234

1/6/2014

11

1

0

1234

1/7/2014

93

0

1

4432

5/23/2015

32

0

0

4432

5/24/2015

21

0

0

4432

5/25/2015

76

1

0

4432

5/26/2015

23

0

1






I also have to exclude the entries BEFORE flag_1 = 1



So:


subject_id

date

volume

flag_1

flag_2

1234

1/6/2014

11

1

0

1234

1/7/2014

93

0

1

4432

5/25/2015

76

1

0

4432

5/26/2015

23

0

1




Thanks!


PGStats
Opal | Level 21

Trying to keep it simple:

 

proc sql;
select *
from have
group by subject_id
having 
    date >= min(case when flag_1 then date else . end) and
    date <= min(case when flag_2 then date else constant('BIG') end);
quit;

but at the same time, being careful to handle cases when there is no flag_1 = 1 or no flag_2 = 1. Using the fact that everything is > Null  and nothing is > constant('BIG')

PG
Astounding
PROC Star

Can the same ID have more than one observation where FLAG_2=1 ?  What should happen?

 

In any case, I would lean toward a DATA step instead of SQL, as long as your data is already in sorted order:

 

data want;
   set have;
   by id;
   retain output_flag 'N';
   if first.id then output_flag='Y';
   if output_flag='Y' then output;
   if flag_2=1 then output_flag='N';
   drop output_flag;
run;

Processing the records sequentially should be much faster than anything that involves a join.

novinosrin
Tourmaline | Level 20

Hi @radhikaa4  it's fairly straight forward-->

 

data have;
input subject_id	date :mmddyy10.	volume	flag_1	flag_2 ;
format date mmddyy10.;
cards;
1234	1/5/2014	23	0	0
1234	1/6/2014	11	1	0
1234	1/7/2014	93	0	1
1234	1/8/2014	12	0	0
1234	1/9/2014	43	0	0
4432	5/23/2015	32	0	0
4432	5/24/2015	21	0	0
4432	5/25/2015	76	1	0
4432	5/26/2015	23	0	1
;

proc sql;
create table want as
select *
from have
group by subject_id
having date<= max((flag_2=1)*date) 
order by subject_id,date;
quit;
novinosrin
Tourmaline | Level 20

Hello @radhikaa4  I too think you should switch to datastep. If your company is paying for a commercial SAS enterprise license, why not use better approaches right?

 

data have;
input subject_id	date :mmddyy10.	volume	flag_1	flag_2 ;
format date mmddyy10.;
cards;
1234	1/5/2014	23	0	0
1234	1/6/2014	11	1	0
1234	1/7/2014	93	0	1
1234	1/8/2014	12	0	0
1234	1/9/2014	43	0	0
4432	5/23/2015	32	0	0
4432	5/24/2015	21	0	0
4432	5/25/2015	76	1	0
4432	5/26/2015	23	0	1
;

data want;
set have;
by subject_id date;
if first.subject_id then d=1;
if d<=1 ;
d+flag_2;
drop d;
run;