Group Year
A 2011
A 2011
A 2011
A 2011
A 2010
A 2010
A 2010
B 2011
B 2011
B 2011
B 2010
B 2010
B 2009
B 2009
. .
. .
. .
Data Table4;
set Table3;
by group;
if year in (2009, 2010, 2011);
run;
My goal is to end up a dataset with only group that have observations from 2009 and 2010 and 2011 (not just 2010 and 2011). Each observation has only one year, but the group has multiple years. In the complete data set there are roughly 2000 groups with only 500 that have all three years....I've tried all the permutations i can think of, i.e:
Data Table4;
set Table3;
by group;
if (year = 2009) && (year= 2010) && (year= 2011); /*This data set return nothing when it SHOULD return a data set with all of group B
run;
I've scoured the web and every other resource available but nothing has worked as I need it. I've also tried a solution in PROC SQL, but it's just as clunky....Thank you for any advice
If using SQL in one step,
to cheat:
proc sql;
create table want as
select * from have
group by group
having count (distinct year)=3
;
quit;
or for more general purpose:
proc sql;
create table want as
select * from have
group by group
having sum(year=2009)*sum(year=2010)*sum(year=2011)>0
;
quit;
Kindly Regards,
Haikuo
Do you want something like this?
data have;
input group $ :year;
cards;
A 2011
A 2011
A 2011
A 2011
A 2010
A 2010
A 2010
B 2011
B 2011
B 2011
B 2010
B 2010
B 2009
B 2009
;
proc sort data=have out=temp (where=(year in (2009,2010,2011))) nodupkey;
by group year;
data temp;
set temp;
by group;
count + (-first.group*count) + 1;
if count=3;
proc sql;
create table want as select * from have
where group in (select group from temp)
order by group, year;
quit;
proc print data=want;
run;
Obs group year
1 B 2009
2 B 2009
3 B 2010
4 B 2010
5 B 2011
6 B 2011
7 B 2011
Linlin
If using SQL in one step,
to cheat:
proc sql;
create table want as
select * from have
group by group
having count (distinct year)=3
;
quit;
or for more general purpose:
proc sql;
create table want as
select * from have
group by group
having sum(year=2009)*sum(year=2010)*sum(year=2011)>0
;
quit;
Kindly Regards,
Haikuo
Although SQL approach is more native for this problem, here could be one of the Data Step solutions:
data have;
input group $ :year;
cards;
A 2011
A 2011
A 2011
A 2011
A 2010
A 2010
A 2010
B 2011
B 2011
B 2011
B 2010
B 2010
B 2009
B 2009
;
data want (drop=_:);
retain _y _c ;
do until (last.group);
set have;
by group descending year ;
if first.group then
do;
_y=year;
if _y in (2009,2010,2011) then _c=1;
end;
if _y ne year and year in (2009,2010,2011) then
do;
_y=year;
_c+1;
end;
end;
do until (last.group);
set have;
by group descending year ;
if _c=3 then output;
end;
_c=0;
run;
Kindly Regards,
Haikuo
I think you're looking for OR rather than AND
b/c for a specific observation couldn't be 2009/10/11 but it could be either.
You can use two DOW loops.
data want ;
y2009=0;
y2010=0;
y2011=0;
do until (last.group);
set have (keep=group year);
by group;
if year=2009 then y2009=1;
if year=2010 then y2010=1;
if year=2011 then y2011=1;
end;
do until (last.group);
set have;
by group;
if y2009 and y2010 and y2011 then output;
end;
run;
hi ... another double DOW idea ...
data want (drop=years);
length years $200;
do until (last.group);
set have;
by group;
if ^find(years,cat(year)) and year in (2009:2011) then years=catx(',',years,year);
end;
do until (last.group);
set have;
by group;
if length(years) eq 14 then output;
end;
run;
if there are only data from 2009 through 2011 ...
if ^find(years,cat(year)) then years=catx(',',years,year);
I've used a PROC SQL statement similar to Haikuo's solution
However, this was very impression by all people. Thank you so much for the responses. I plan on using this forum in the future if I am stumped in the future!
Cheers!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.