Solved
Contributor
Posts: 44

# panel data - drop panelists satisfying a condition

Hi all,

I have a (probably) simple question.

I havea  panel data that looks like:

 ID Year 1 2004 1 2005 1 2006 1 2007 2 2004 2 2005 2 2006 2 2007 2 2008 2 2009

What I want to do is... I want to ignore all panelists that had year 2009.

Can someone please tell me how I can do this?

Basically, I want SAS to go through each panelist and indicate (or remove) those that have 2009 year.

Thanks,

C

Accepted Solutions
Solution
‎10-28-2011 12:44 AM
PROC Star
Posts: 8,164

## panel data - drop panelists satisfying a condition

There are a number of ways to do it.  I happen to like to use the datastep.  One way to do it with a datastep would be:

data panel;

input ID Year income;

cards;

1 2004 50000

1 2005 60000

1 2006 70000

1 2007 80000

2 2004 60000

2 2005 70000

2 2006 80000

2 2007 90000

2 2008 100000

2 2009 110000

3 2004 60000

3 2005 70000

3 2006 80000

3 2007 90000

3 2008 100000

;

data want;

do until (last.id);

set panel;

by id;

if first.id then do;

n=0;

average_salary=0;

end;

if year eq 2009 then drop=1;

average_salary+income;

n+1;

end;

do until (last.id);

set panel;

by id;

if drop ne 1 then do;

if first.id then

average_salary=average_salary/n;

output;

end;

end;

run;

All Replies
Contributor
Posts: 44

## panel data - drop panelists satisfying a condition

second question-

let's say I have a variable "income" for each panlist, and it takes a different value each year for each panelist. how can i calculate average income, across years, for each panelist?

thanks,

C

PROC Star
Posts: 8,164

## panel data - drop panelists satisfying a condition

Can any panelist have a year greater than 2009?

Contributor
Posts: 44

## panel data - drop panelists satisfying a condition

yes. as long as 2009 is not there.

Solution
‎10-28-2011 12:44 AM
PROC Star
Posts: 8,164

## panel data - drop panelists satisfying a condition

There are a number of ways to do it.  I happen to like to use the datastep.  One way to do it with a datastep would be:

data panel;

input ID Year income;

cards;

1 2004 50000

1 2005 60000

1 2006 70000

1 2007 80000

2 2004 60000

2 2005 70000

2 2006 80000

2 2007 90000

2 2008 100000

2 2009 110000

3 2004 60000

3 2005 70000

3 2006 80000

3 2007 90000

3 2008 100000

;

data want;

do until (last.id);

set panel;

by id;

if first.id then do;

n=0;

average_salary=0;

end;

if year eq 2009 then drop=1;

average_salary+income;

n+1;

end;

do until (last.id);

set panel;

by id;

if drop ne 1 then do;

if first.id then

average_salary=average_salary/n;

output;

end;

end;

run;

Contributor
Posts: 44

thank you, sir!

Super User
Posts: 10,784

## panel data - drop panelists satisfying a condition

SQL is easy for this situation.

```data panel;
input ID Year income;
cards;
1 2004 50000
1 2005 60000
1 2006 70000
1 2007 80000
2 2004 60000
2 2005 70000
2 2006 80000
2 2007 90000
2 2008 100000
2 2009 110000
3 2004 60000
3 2005 70000
3 2006 80000
3 2007 90000
3 2008 100000
;
run;

proc sql;
create table want as
select *,avg(income) as avg_income
from panel
group by id
having sum(year=2009)=0;quit;

```

Ksharp

Contributor
Posts: 44

## panel data - drop panelists satisfying a condition

thanks, Ksharp. this almost did the job. except, I need to take average including .'s. In other words, the average is taken only using non-blank values. I want to divide total by (#blank + #non-blank).

thanks,

C

PROC Star
Posts: 8,164

## panel data - drop panelists satisfying a condition

You could still use KSharp's suggested method, but you would have to include the calculation of average.  e.g.,

data panel;

input ID Year income;

cards;

1 2004 50000

1 2005 .

1 2006 .

1 2007 50000

2 2004 60000

2 2005 70000

2 2006 80000

2 2007 90000

2 2008 100000

2 2009 110000

3 2004 60000

3 2005 70000

3 2006 80000

3 2007 90000

3 2008 100000

;

run;

proc sql;

create table want as

select *,sum(income)/count(*) as avg_income

from panel

group by id

having sum(year=2009)=0;

quit;

Super User
Posts: 8,114

## panel data - drop panelists satisfying a condition

I am not sure I understand this requirement.  Are you saying that if a panelist did not report an income for a year you want to treat that as \$0?

Another method that I like to use that requires fewer copies of the data is create a selection criteria that can be used in a where statement.

So for your example I would get a list of id values and use it to subset the data.

proc sql noprint ;

select distinct id into :id2009 separate by ' ' from have where year=2009;

quit;

proc means data=have;

where id not in (&id2009);

var income;

run;

Super User
Posts: 10,784

## panel data - drop panelists satisfying a condition

Art has already given you the correct code.

🔒 This topic is solved and locked.