BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JimK
Calcite | Level 5

Group     Year

A     2011

A     2011

A     2011

A     2011

A     2010

A     2010

A     2010

B     2011

B     2011

B     2011

B     2010

B     2010

B     2009

B     2009

.     .

.     .

.     .

Data Table4;

set Table3;

by group;

if year in (2009, 2010, 2011);

run;

My goal is to end up a dataset with only group that have observations from 2009 and 2010 and 2011 (not just 2010 and 2011). Each observation has only one year, but the group has multiple years. In the complete data set there are roughly 2000 groups with only 500 that have all three years....I've tried all the permutations i can think of, i.e:

Data Table4;

set Table3;

by group;

if (year = 2009) && (year= 2010) && (year= 2011);          /*This data set return nothing when it SHOULD return a data set with all of group B

run;

I've scoured the web and every other resource available but nothing has worked as I need it. I've also tried a solution in PROC SQL, but it's just as clunky....Thank you for any advice

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo_old
Fluorite | Level 6

If using SQL in one step,

to cheat:

proc sql;

create table want as

select * from have

group by group

having count (distinct year)=3

;

quit;

or for more general purpose:

proc sql;

create table want as

select * from have

group by group

having sum(year=2009)*sum(year=2010)*sum(year=2011)>0

;

quit;

Kindly Regards,

Haikuo

View solution in original post

7 REPLIES 7
Linlin
Lapis Lazuli | Level 10

Do you want something like this?

data have;

input group $ :year;

cards;

A     2011

A     2011

A     2011

A     2011

A     2010

A     2010

A     2010

B     2011

B     2011

B     2011

B     2010

B     2010

B     2009

B     2009

;

proc sort data=have out=temp (where=(year in (2009,2010,2011))) nodupkey;

by group year;

data temp;

  set temp;

  by group;

  count + (-first.group*count) + 1;

  if count=3;

proc sql;

   create table want as select * from have

     where group in (select group from temp)

       order by group, year;

quit;

proc print data=want;

run;

Obs    group    year

1       B      2009

2       B      2009

3       B      2010

4       B      2010

5       B      2011

6       B      2011

7       B      2011

Linlin

Haikuo_old
Fluorite | Level 6

If using SQL in one step,

to cheat:

proc sql;

create table want as

select * from have

group by group

having count (distinct year)=3

;

quit;

or for more general purpose:

proc sql;

create table want as

select * from have

group by group

having sum(year=2009)*sum(year=2010)*sum(year=2011)>0

;

quit;

Kindly Regards,

Haikuo

Haikuo_old
Fluorite | Level 6

Although SQL approach is more native for this problem, here could be one of the Data Step solutions:

data have;

input group $ :year;

cards;

A     2011

A     2011

A     2011

A     2011

A     2010

A     2010

A     2010

B     2011

B     2011

B     2011

B     2010

B     2010

B     2009

B     2009

;

data want (drop=_:);

retain _y _c ;

do until (last.group);

   set have;

   by group descending year ;

     if first.group then

        do;

          _y=year;

if _y in (2009,2010,2011) then _c=1;

end;

if _y ne year and year in (2009,2010,2011) then

do;

_y=year;

_c+1;

end;

end;

do until (last.group);

    set have;

by group descending year ;

if _c=3 then output;

end;

_c=0;

run;

Kindly Regards,

Haikuo

Reeza
Super User

I think you're looking for OR rather than AND

b/c for a specific observation couldn't be 2009/10/11 but it could be either.

Tom
Super User Tom
Super User

You can use two DOW loops.

data want ;

  y2009=0;

  y2010=0;

  y2011=0;

  do until (last.group);

    set have (keep=group year);

    by group;

    if year=2009 then y2009=1;

    if year=2010 then y2010=1;

    if year=2011 then y2011=1;

  end;

  do until (last.group);

    set have;

    by group;

    if y2009 and y2010 and y2011 then output;

  end;

run;

MikeZdeb
Rhodochrosite | Level 12

hi ... another double DOW idea ...

data want (drop=years);

length years $200;

do until (last.group);

  set have;

  by group;

  if ^find(years,cat(year)) and year in (2009:2011) then years=catx(',',years,year);

end;

do until (last.group);

  set have;

  by group;

  if length(years) eq 14 then output;

end;

run;

if there are only data from 2009 through 2011 ...

if ^find(years,cat(year)) then years=catx(',',years,year);

JimK
Calcite | Level 5

I've used a PROC SQL statement similar to Haikuo's solution

However, this was very impression by all people. Thank you so much for the responses. I plan on using this forum in the future if I am stumped in the future!

Cheers!

sas-innovate-2024.png

Today is the last day to save with the early bird rate! Register today for just $695 - $100 off the standard rate.

 

Plus, pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1409 views
  • 1 like
  • 6 in conversation