BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jkf91
Calcite | Level 5

Hi all,

I have a (probably) simple question.

I havea  panel data that looks like:

ID

 

Year

 

1

 

2004

 

1

 

2005

 

1

 

2006

 

1

 

2007

 

2

 

2004

 

2

 

2005

 

2

 

2006

 

2

 

2007

 

2

 

2008

 

2

 

2009

 

What I want to do is... I want to ignore all panelists that had year 2009.

Can someone please tell me how I can do this?

Basically, I want SAS to go through each panelist and indicate (or remove) those that have 2009 year.

Thanks,

C

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

There are a number of ways to do it.  I happen to like to use the datastep.  One way to do it with a datastep would be:

data panel;

  input ID Year income;

  cards;

1 2004 50000

1 2005 60000

1 2006 70000

1 2007 80000

2 2004 60000

2 2005 70000

2 2006 80000

2 2007 90000

2 2008 100000

2 2009 110000

3 2004 60000

3 2005 70000

3 2006 80000

3 2007 90000

3 2008 100000

;

data want;

  do until (last.id);

    set panel;

    by id;

    if first.id then do;

      n=0;

            average_salary=0;

    end;

    if year eq 2009 then drop=1;

    average_salary+income;

    n+1;

  end;

  do until (last.id);

    set panel;

    by id;

    if drop ne 1 then do;

      if first.id then

       average_salary=average_salary/n;

      output;

    end;

  end;

run;

View solution in original post

10 REPLIES 10
jkf91
Calcite | Level 5

second question-

let's say I have a variable "income" for each panlist, and it takes a different value each year for each panelist. how can i calculate average income, across years, for each panelist?

thanks,

C

art297
Opal | Level 21

Can any panelist have a year greater than 2009?

jkf91
Calcite | Level 5

yes. as long as 2009 is not there.

art297
Opal | Level 21

There are a number of ways to do it.  I happen to like to use the datastep.  One way to do it with a datastep would be:

data panel;

  input ID Year income;

  cards;

1 2004 50000

1 2005 60000

1 2006 70000

1 2007 80000

2 2004 60000

2 2005 70000

2 2006 80000

2 2007 90000

2 2008 100000

2 2009 110000

3 2004 60000

3 2005 70000

3 2006 80000

3 2007 90000

3 2008 100000

;

data want;

  do until (last.id);

    set panel;

    by id;

    if first.id then do;

      n=0;

            average_salary=0;

    end;

    if year eq 2009 then drop=1;

    average_salary+income;

    n+1;

  end;

  do until (last.id);

    set panel;

    by id;

    if drop ne 1 then do;

      if first.id then

       average_salary=average_salary/n;

      output;

    end;

  end;

run;

jkf91
Calcite | Level 5

thank you, sir!

Ksharp
Super User

SQL is easy for this situation.

data panel;
  input ID Year income;
  cards;
1 2004 50000
1 2005 60000
1 2006 70000
1 2007 80000
2 2004 60000
2 2005 70000
2 2006 80000
2 2007 90000
2 2008 100000
2 2009 110000
3 2004 60000
3 2005 70000
3 2006 80000
3 2007 90000
3 2008 100000
;
run;

proc sql;
 create table want as
  select *,avg(income) as avg_income
   from panel
    group by id
     having sum(year=2009)=0;quit;

Ksharp

jkf91
Calcite | Level 5

thanks, Ksharp. this almost did the job. except, I need to take average including .'s. In other words, the average is taken only using non-blank values. I want to divide total by (#blank + #non-blank).

thanks,

C

art297
Opal | Level 21

You could still use KSharp's suggested method, but you would have to include the calculation of average.  e.g.,

data panel;

  input ID Year income;

  cards;

1 2004 50000

1 2005 .

1 2006 .

1 2007 50000

2 2004 60000

2 2005 70000

2 2006 80000

2 2007 90000

2 2008 100000

2 2009 110000

3 2004 60000

3 2005 70000

3 2006 80000

3 2007 90000

3 2008 100000

;

run;

proc sql;

create table want as

  select *,sum(income)/count(*) as avg_income

   from panel

    group by id

     having sum(year=2009)=0;

quit;

Tom
Super User Tom
Super User

I am not sure I understand this requirement.  Are you saying that if a panelist did not report an income for a year you want to treat that as $0?

Another method that I like to use that requires fewer copies of the data is create a selection criteria that can be used in a where statement.

So for your example I would get a list of id values and use it to subset the data.

proc sql noprint ;

   select distinct id into :id2009 separate by ' ' from have where year=2009;

quit;

proc means data=have;

   where id not in (&id2009);

   var income;

run;

Ksharp
Super User

Art has already given you the correct code.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1753 views
  • 0 likes
  • 4 in conversation