Help using Base SAS procedures

panel data - drop panelists satisfying a condition

Accepted Solution Solved
Reply
Contributor
Posts: 44
Accepted Solution

panel data - drop panelists satisfying a condition

Hi all,

I have a (probably) simple question.

I havea  panel data that looks like:

ID

 

Year

 

1

 

2004

 

1

 

2005

 

1

 

2006

 

1

 

2007

 

2

 

2004

 

2

 

2005

 

2

 

2006

 

2

 

2007

 

2

 

2008

 

2

 

2009

 

What I want to do is... I want to ignore all panelists that had year 2009.

Can someone please tell me how I can do this?

Basically, I want SAS to go through each panelist and indicate (or remove) those that have 2009 year.

Thanks,

C


Accepted Solutions
Solution
‎10-28-2011 12:44 AM
PROC Star
Posts: 7,356

panel data - drop panelists satisfying a condition

There are a number of ways to do it.  I happen to like to use the datastep.  One way to do it with a datastep would be:

data panel;

  input ID Year income;

  cards;

1 2004 50000

1 2005 60000

1 2006 70000

1 2007 80000

2 2004 60000

2 2005 70000

2 2006 80000

2 2007 90000

2 2008 100000

2 2009 110000

3 2004 60000

3 2005 70000

3 2006 80000

3 2007 90000

3 2008 100000

;

data want;

  do until (last.id);

    set panel;

    by id;

    if first.id then do;

      n=0;

            average_salary=0;

    end;

    if year eq 2009 then drop=1;

    average_salary+income;

    n+1;

  end;

  do until (last.id);

    set panel;

    by id;

    if drop ne 1 then do;

      if first.id then

       average_salary=average_salary/n;

      output;

    end;

  end;

run;

View solution in original post


All Replies
Contributor
Posts: 44

panel data - drop panelists satisfying a condition

second question-

let's say I have a variable "income" for each panlist, and it takes a different value each year for each panelist. how can i calculate average income, across years, for each panelist?

thanks,

C

PROC Star
Posts: 7,356

panel data - drop panelists satisfying a condition

Can any panelist have a year greater than 2009?

Contributor
Posts: 44

panel data - drop panelists satisfying a condition

yes. as long as 2009 is not there.

Solution
‎10-28-2011 12:44 AM
PROC Star
Posts: 7,356

panel data - drop panelists satisfying a condition

There are a number of ways to do it.  I happen to like to use the datastep.  One way to do it with a datastep would be:

data panel;

  input ID Year income;

  cards;

1 2004 50000

1 2005 60000

1 2006 70000

1 2007 80000

2 2004 60000

2 2005 70000

2 2006 80000

2 2007 90000

2 2008 100000

2 2009 110000

3 2004 60000

3 2005 70000

3 2006 80000

3 2007 90000

3 2008 100000

;

data want;

  do until (last.id);

    set panel;

    by id;

    if first.id then do;

      n=0;

            average_salary=0;

    end;

    if year eq 2009 then drop=1;

    average_salary+income;

    n+1;

  end;

  do until (last.id);

    set panel;

    by id;

    if drop ne 1 then do;

      if first.id then

       average_salary=average_salary/n;

      output;

    end;

  end;

run;

Contributor
Posts: 44

panel data - drop panelists satisfying a condition

thank you, sir!

Super User
Posts: 9,662

panel data - drop panelists satisfying a condition

SQL is easy for this situation.

data panel;
  input ID Year income;
  cards;
1 2004 50000
1 2005 60000
1 2006 70000
1 2007 80000
2 2004 60000
2 2005 70000
2 2006 80000
2 2007 90000
2 2008 100000
2 2009 110000
3 2004 60000
3 2005 70000
3 2006 80000
3 2007 90000
3 2008 100000
;
run;

proc sql;
 create table want as
  select *,avg(income) as avg_income
   from panel
    group by id
     having sum(year=2009)=0;quit;

Ksharp

Contributor
Posts: 44

panel data - drop panelists satisfying a condition

thanks, Ksharp. this almost did the job. except, I need to take average including .'s. In other words, the average is taken only using non-blank values. I want to divide total by (#blank + #non-blank).

thanks,

C

PROC Star
Posts: 7,356

panel data - drop panelists satisfying a condition

You could still use KSharp's suggested method, but you would have to include the calculation of average.  e.g.,

data panel;

  input ID Year income;

  cards;

1 2004 50000

1 2005 .

1 2006 .

1 2007 50000

2 2004 60000

2 2005 70000

2 2006 80000

2 2007 90000

2 2008 100000

2 2009 110000

3 2004 60000

3 2005 70000

3 2006 80000

3 2007 90000

3 2008 100000

;

run;

proc sql;

create table want as

  select *,sum(income)/count(*) as avg_income

   from panel

    group by id

     having sum(year=2009)=0;

quit;

Super User
Super User
Posts: 6,495

panel data - drop panelists satisfying a condition

I am not sure I understand this requirement.  Are you saying that if a panelist did not report an income for a year you want to treat that as $0?

Another method that I like to use that requires fewer copies of the data is create a selection criteria that can be used in a where statement.

So for your example I would get a list of id values and use it to subset the data.

proc sql noprint ;

   select distinct id into :id2009 separate by ' ' from have where year=2009;

quit;

proc means data=have;

   where id not in (&id2009);

   var income;

run;

Super User
Posts: 9,662

panel data - drop panelists satisfying a condition

Art has already given you the correct code.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 163 views
  • 0 likes
  • 4 in conversation