BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jkf91
Calcite | Level 5

Hi all,

I have a (probably) simple question.

I havea  panel data that looks like:

ID

 

Year

 

1

 

2004

 

1

 

2005

 

1

 

2006

 

1

 

2007

 

2

 

2004

 

2

 

2005

 

2

 

2006

 

2

 

2007

 

2

 

2008

 

2

 

2009

 

What I want to do is... I want to ignore all panelists that had year 2009.

Can someone please tell me how I can do this?

Basically, I want SAS to go through each panelist and indicate (or remove) those that have 2009 year.

Thanks,

C

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

There are a number of ways to do it.  I happen to like to use the datastep.  One way to do it with a datastep would be:

data panel;

  input ID Year income;

  cards;

1 2004 50000

1 2005 60000

1 2006 70000

1 2007 80000

2 2004 60000

2 2005 70000

2 2006 80000

2 2007 90000

2 2008 100000

2 2009 110000

3 2004 60000

3 2005 70000

3 2006 80000

3 2007 90000

3 2008 100000

;

data want;

  do until (last.id);

    set panel;

    by id;

    if first.id then do;

      n=0;

            average_salary=0;

    end;

    if year eq 2009 then drop=1;

    average_salary+income;

    n+1;

  end;

  do until (last.id);

    set panel;

    by id;

    if drop ne 1 then do;

      if first.id then

       average_salary=average_salary/n;

      output;

    end;

  end;

run;

View solution in original post

10 REPLIES 10
jkf91
Calcite | Level 5

second question-

let's say I have a variable "income" for each panlist, and it takes a different value each year for each panelist. how can i calculate average income, across years, for each panelist?

thanks,

C

art297
Opal | Level 21

Can any panelist have a year greater than 2009?

jkf91
Calcite | Level 5

yes. as long as 2009 is not there.

art297
Opal | Level 21

There are a number of ways to do it.  I happen to like to use the datastep.  One way to do it with a datastep would be:

data panel;

  input ID Year income;

  cards;

1 2004 50000

1 2005 60000

1 2006 70000

1 2007 80000

2 2004 60000

2 2005 70000

2 2006 80000

2 2007 90000

2 2008 100000

2 2009 110000

3 2004 60000

3 2005 70000

3 2006 80000

3 2007 90000

3 2008 100000

;

data want;

  do until (last.id);

    set panel;

    by id;

    if first.id then do;

      n=0;

            average_salary=0;

    end;

    if year eq 2009 then drop=1;

    average_salary+income;

    n+1;

  end;

  do until (last.id);

    set panel;

    by id;

    if drop ne 1 then do;

      if first.id then

       average_salary=average_salary/n;

      output;

    end;

  end;

run;

jkf91
Calcite | Level 5

thank you, sir!

Ksharp
Super User

SQL is easy for this situation.

data panel;
  input ID Year income;
  cards;
1 2004 50000
1 2005 60000
1 2006 70000
1 2007 80000
2 2004 60000
2 2005 70000
2 2006 80000
2 2007 90000
2 2008 100000
2 2009 110000
3 2004 60000
3 2005 70000
3 2006 80000
3 2007 90000
3 2008 100000
;
run;

proc sql;
 create table want as
  select *,avg(income) as avg_income
   from panel
    group by id
     having sum(year=2009)=0;quit;

Ksharp

jkf91
Calcite | Level 5

thanks, Ksharp. this almost did the job. except, I need to take average including .'s. In other words, the average is taken only using non-blank values. I want to divide total by (#blank + #non-blank).

thanks,

C

art297
Opal | Level 21

You could still use KSharp's suggested method, but you would have to include the calculation of average.  e.g.,

data panel;

  input ID Year income;

  cards;

1 2004 50000

1 2005 .

1 2006 .

1 2007 50000

2 2004 60000

2 2005 70000

2 2006 80000

2 2007 90000

2 2008 100000

2 2009 110000

3 2004 60000

3 2005 70000

3 2006 80000

3 2007 90000

3 2008 100000

;

run;

proc sql;

create table want as

  select *,sum(income)/count(*) as avg_income

   from panel

    group by id

     having sum(year=2009)=0;

quit;

Tom
Super User Tom
Super User

I am not sure I understand this requirement.  Are you saying that if a panelist did not report an income for a year you want to treat that as $0?

Another method that I like to use that requires fewer copies of the data is create a selection criteria that can be used in a where statement.

So for your example I would get a list of id values and use it to subset the data.

proc sql noprint ;

   select distinct id into :id2009 separate by ' ' from have where year=2009;

quit;

proc means data=have;

   where id not in (&id2009);

   var income;

run;

Ksharp
Super User

Art has already given you the correct code.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 979 views
  • 0 likes
  • 4 in conversation