turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Subset data using 'by'

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-24-2012 12:34 PM

Group Year

A 2011

A 2011

A 2011

A 2011

A 2010

A 2010

A 2010

B 2011

B 2011

B 2011

B 2010

B 2010

B 2009

B 2009

. .

. .

. .

Data Table4;

set Table3;

by group;

if year in (2009, 2010, 2011);

run;

My goal is to end up a dataset with only group that have observations from 2009 and 2010 and 2011 (not just 2010 and 2011). Each observation has only one year, but the group has multiple years. In the complete data set there are roughly 2000 groups with only 500 that have all three years....I've tried all the permutations i can think of, i.e:

Data Table4;

set Table3;

by group;

if (year = 2009) && (year= 2010) && (year= 2011); /*This data set return nothing when it SHOULD return a data set with all of group B

run;

I've scoured the web and every other resource available but nothing has worked as I need it. I've also tried a solution in PROC SQL, but it's just as clunky....Thank you for any advice

Accepted Solutions

Solution

01-24-2012
01:23 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to JimK

01-24-2012 01:23 PM

If using SQL in one step,

to cheat:

proc sql;

create table want as

select * from have

group by group

having count (distinct year)=3

;

quit;

or for more general purpose:

proc sql;

create table want as

select * from have

group by group

having sum(year=2009)*sum(year=2010)*sum(year=2011)>0

;

quit;

Kindly Regards,

Haikuo

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to JimK

01-24-2012 01:14 PM

Do you want something like this?

**data** have;

input group $ :year;

cards;

A 2011

A 2011

A 2011

A 2011

A 2010

A 2010

A 2010

B 2011

B 2011

B 2011

B 2010

B 2010

B 2009

B 2009

;

**proc** **sort** data=have out=temp (where=(year in (**2009**,**2010**,**2011**))) nodupkey;

by group year;

**data** temp;

set temp;

by group;

count + (-first.group*count) + **1**;

if count=**3**;

**proc** **sql**;

create table want as select * from have

where group in (select group from temp)

order by group, year;

**quit**;

**proc** **print** data=want;

**run**;

Obs group year

1 B 2009

2 B 2009

3 B 2010

4 B 2010

5 B 2011

6 B 2011

7 B 2011

Linlin

Solution

01-24-2012
01:23 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to JimK

01-24-2012 01:23 PM

If using SQL in one step,

to cheat:

proc sql;

create table want as

select * from have

group by group

having count (distinct year)=3

;

quit;

or for more general purpose:

proc sql;

create table want as

select * from have

group by group

having sum(year=2009)*sum(year=2010)*sum(year=2011)>0

;

quit;

Kindly Regards,

Haikuo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Haikuo_old

01-24-2012 02:05 PM

Although SQL approach is more native for this problem, here could be one of the Data Step solutions:

data have;

input group $ :year;

cards;

A 2011

A 2011

A 2011

A 2011

A 2010

A 2010

A 2010

B 2011

B 2011

B 2011

B 2010

B 2010

B 2009

B 2009

;

data want (drop=_;

retain _y _c ;

do until (last.group);

set have;

by group descending year ;

if first.group then

do;

_y=year;

if _y in (2009,2010,2011) then _c=1;

end;

if _y ne year and year in (2009,2010,2011) then

do;

_y=year;

_c+1;

end;

end;

do until (last.group);

set have;

by group descending year ;

if _c=3 then output;

end;

_c=0;

run;

Kindly Regards,

Haikuo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to JimK

01-24-2012 01:25 PM

I think you're looking for OR rather than AND

b/c for a specific observation couldn't be 2009/10/11 but it could be either.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to JimK

01-24-2012 02:23 PM

You can use two DOW loops.

data want ;

y2009=0;

y2010=0;

y2011=0;

do until (last.group);

set have (keep=group year);

by group;

if year=2009 then y2009=1;

if year=2010 then y2010=1;

if year=2011 then y2011=1;

end;

do until (last.group);

set have;

by group;

if y2009 and y2010 and y2011 then output;

end;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to JimK

01-24-2012 02:42 PM

hi ... another double DOW idea ...

**data want (drop=years);**

**length years $200;**

**do until (last.group);**

** set have;**

** by group;**

** if ^find(years,cat(year)) and year in (2009:2011) then years=catx(',',years,year);**

**end;**

**do until (last.group);**

** set have;**

** by group;**

** if length(years) eq 14 then output;**

**end;**

**run;**

if there are only data from 2009 through 2011 ...

**if ^find(years,cat(year)) then years=catx(',',years,year);**

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to JimK

01-24-2012 02:51 PM

I've used a PROC SQL statement similar to Haikuo's solution

However, this was very impression by all people. Thank you so much for the responses. I plan on using this forum in the future if I am stumped in the future!

Cheers!