Solved: panel data - drop panelists satisfying a condition

jkf91 · Posted 10-27-2011 10:43 PM

Hi all,

I have a (probably) simple question.

I havea panel data that looks like:

ID	Year
1	2004
1	2005
1	2006
1	2007
2	2004
2	2005
2	2006
2	2007
2	2008
2	2009

What I want to do is... I want to ignore all panelists that had year 2009.

Can someone please tell me how I can do this?

Basically, I want SAS to go through each panelist and indicate (or remove) those that have 2009 year.

Thanks,

C

art297 · Posted 10-28-2011 12:44 AM

There are a number of ways to do it. I happen to like to use the datastep. One way to do it with a datastep would be:

data panel;

input ID Year income;

cards;

1 2004 50000

1 2005 60000

1 2006 70000

1 2007 80000

2 2004 60000

2 2005 70000

2 2006 80000

2 2007 90000

2 2008 100000

2 2009 110000

3 2004 60000

3 2005 70000

3 2006 80000

3 2007 90000

3 2008 100000

;

data want;

do until (last.id);

set panel;

by id;

if first.id then do;

n=0;

average_salary=0;

end;

if year eq 2009 then drop=1;

average_salary+income;

n+1;

end;

do until (last.id);

set panel;

by id;

if drop ne 1 then do;

if first.id then

average_salary=average_salary/n;

output;

end;

run;

View solution in original post

jkf91 · Posted 10-27-2011 10:51 PM

second question-

let's say I have a variable "income" for each panlist, and it takes a different value each year for each panelist. how can i calculate average income, across years, for each panelist?

thanks,

C

art297 · Posted 10-27-2011 11:10 PM

Can any panelist have a year greater than 2009?

jkf91 · Posted 10-27-2011 11:15 PM

yes. as long as 2009 is not there.

art297 · Posted 10-28-2011 12:44 AM

There are a number of ways to do it. I happen to like to use the datastep. One way to do it with a datastep would be:

data panel;

input ID Year income;

cards;

1 2004 50000

1 2005 60000

1 2006 70000

1 2007 80000

2 2004 60000

2 2005 70000

2 2006 80000

2 2007 90000

2 2008 100000

2 2009 110000

3 2004 60000

3 2005 70000

3 2006 80000

3 2007 90000

3 2008 100000

;

data want;

do until (last.id);

set panel;

by id;

if first.id then do;

n=0;

average_salary=0;

end;

if year eq 2009 then drop=1;

average_salary+income;

n+1;

end;

do until (last.id);

set panel;

by id;

if drop ne 1 then do;

if first.id then

average_salary=average_salary/n;

output;

end;

run;

jkf91 · Posted 10-28-2011 02:34 AM

thank you, sir!

Ksharp · Posted 10-28-2011 03:32 AM

SQL is easy for this situation.

data panel;
  input ID Year income;
  cards;
1 2004 50000
1 2005 60000
1 2006 70000
1 2007 80000
2 2004 60000
2 2005 70000
2 2006 80000
2 2007 90000
2 2008 100000
2 2009 110000
3 2004 60000
3 2005 70000
3 2006 80000
3 2007 90000
3 2008 100000
;
run;

proc sql;
 create table want as
  select *,avg(income) as avg_income
   from panel
    group by id
     having sum(year=2009)=0;quit;

Ksharp

jkf91 · Posted 10-28-2011 08:44 AM

thanks, Ksharp. this almost did the job. except, I need to take average including .'s. In other words, the average is taken only using non-blank values. I want to divide total by (#blank + #non-blank).

thanks,

C

art297 · Posted 10-28-2011 09:25 AM

You could still use KSharp's suggested method, but you would have to include the calculation of average. e.g.,

data panel;

input ID Year income;

cards;

1 2004 50000

1 2005 .

1 2006 .

1 2007 50000

2 2004 60000

2 2005 70000

2 2006 80000

2 2007 90000

2 2008 100000

2 2009 110000

3 2004 60000

3 2005 70000

3 2006 80000

3 2007 90000

3 2008 100000

;

run;

proc sql;

create table want as

select *,sum(income)/count(*) as avg_income

from panel

group by id

having sum(year=2009)=0;

quit;

Tom · Posted 10-28-2011 10:01 AM

I am not sure I understand this requirement. Are you saying that if a panelist did not report an income for a year you want to treat that as $0?

Another method that I like to use that requires fewer copies of the data is create a selection criteria that can be used in a where statement.

So for your example I would get a list of id values and use it to subset the data.

proc sql noprint ;

select distinct id into :id2009 separate by ' ' from have where year=2009;

quit;

proc means data=have;

where id not in (&id2009);

var income;

run;

Ksharp · Posted 10-30-2011 09:47 PM

Art has already given you the correct code.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away