Help using Base SAS procedures

A retain question maybe?

Reply
Frequent Contributor
Posts: 101

A retain question maybe?

Hello,

Hope someone can help with this coding data step. I've a data set for 3 years of data (2009-2011) and it's aggregated by ID (just a combination of area, sex and AgeGroup) for visits, cases and amount. However not all IDs have the data for each year and I need to insert a line for the missing years - with repeating info ID, area, sex, and age group and assigning 0 for the variables visits, cases, and amount. I think I need to use retain to do it but got stuck. So much appreciated for any help from the experts on the forum.

data data; input

ID $ Year Area $ Area_desc $ sex $ AgeGroup $ AgeGroup_desc $ Visits Cases Amount ;

datalines;

0100M 2009 01 Area1 M 00 00     2 2 109

0103M 2009 01 Area1 M 03 10-14     1 1 365

0103M 2010 01 Area1 M 03 10-14     1 1 365

0103M 2011 01 Area1 M 03 10-14     1 1 362

0104F 2011 01 Area1 F 04 15-19     1 1 13

0104M 2010 01 Area1 M 04 15-19     1 2 53

;

Desired output (I added a line

0100M 2009 01 Area1 M 00 00     2 2 109

0100M 2010 01 Area1 M 00 00     0 0  0

0100M 2011 01 Area1 M 00 00     0 0  0

0103M 2009 01 Area1 M 03 10-14     1 1 365

0103M 2010 01 Area1 M 03 10-14     1 1 365

0103M 2011 01 Area1 M 03 10-14     1 1 362

0104F 2009 01 Area1 F 04 15-19     0 0 0

0104F 2010 01 Area1 F 04 15-19     0 0 0

0104F 2011 01 Area1 F 04 15-19     1 1 13

0104M 2009 01 Area1 M 04 15-19     0 0 0

0104M 2010 01 Area1 M 04 15-19     1 2 53

0104M 2011 01 Area1 M 04 15-19     0 0 0

;

Super User
Super User
Posts: 6,495

Re: A retain question maybe?

Build an empty frame with all values of YEAR and merge with the source data.

data have;

input

   ID $ Year Area $ Area_desc $ sex $

   AgeGroup $ AgeGroup_desc $ Visits Cases Amount

;

datalines;

0100M 2009 01 Area1 M 00 00     2 2 109

0103M 2009 01 Area1 M 03 10-14     1 1 365

0103M 2010 01 Area1 M 03 10-14     1 1 365

0103M 2011 01 Area1 M 03 10-14     1 1 362

0104F 2011 01 Area1 F 04 15-19     1 1 13

0104M 2010 01 Area1 M 04 15-19     1 2 53

;

data squareup ;

  set have ;

  by id ;

  if first.id ;

  visits=0;

  cases=0;

  amount=0;

  do year = 2009 to 2011 ;

    output;

  end;

run;

data want ;

  merge squareup have;

  by id year;

run;


proc print; run;

Super Contributor
Posts: 1,040

Re: A retain question maybe?

Hi,

I have a question...

data squareup ;

  set have ;

  by id ;

    <<<---------------------may be here?

  if first.id ;

  visits=0;

  cases=0;

  amount=0;

  do year = 2009 to 2011 ;   /*doesn't this DO statement have to be at the top??

    output;

  end;

run;

data want ;

  merge squareup have;

  by id year;

run;

Super User
Super User
Posts: 6,495

Re: A retain question maybe?

Not really.

This program is using subsetting IF statement. The line right after your arrow is key as it is what subsets the data from the source to one observation per ID.  Then the next three assignment statements force those three variables to be zero.  Now we have the dummy data that we want to replicate for each of the three possible year values.

Respected Advisor
Posts: 3,777

Re: A retain question maybe?

Just like Tom's but different.

data data;
  
input ID $ Year Area $ Area_desc $ sex $ AgeGroup $ AgeGroup_desc $ Visits Cases Amount;
datalines;
0100M 2009 01 Area1 M 00 00        2 2 109
0103M 2009 01 Area1 M 03 10-14     1 1 365
0103M 2010 01 Area1 M 03 10-14     1 1 365
0103M 2011 01 Area1 M 03 10-14     1 1 362
0104F 2011 01 Area1 F 04 15-19     1 1 13
0104M 2010 01 Area1 M 04 15-19     1 2 53
;;;;
   run;
data frame;
   set data(drop=visits--amount);
   by id;
   if first.id;
   if 0 then set data(keep=visits--amount);
   retain visits--amount 0;
  
do year = 2009 to 2011;
     
output;
     
end;
  
run;
data data2;
   update frame data(keep=id year visits--amount);
   by id year;
   run;
Frequent Contributor
Posts: 101

Re: A retain question maybe?

Both are clever ways of doing it. Already applied to my data and it worked. Thanks much.

Super User
Posts: 9,662

Re: A retain question maybe?

SQL version:

data have;
input
   ID $ Year Area $ Area_desc $ sex $
   AgeGroup $ AgeGroup_desc $ Visits Cases Amount
;
datalines;
0100M 2009 01 Area1 M 00 00     2 2 109
0103M 2009 01 Area1 M 03 10-14     1 1 365
0103M 2010 01 Area1 M 03 10-14     1 1 365
0103M 2011 01 Area1 M 03 10-14     1 1 362
0104F 2011 01 Area1 F 04 15-19     1 1 13
0104M 2010 01 Area1 M 04 15-19     1 2 53
;
run;

 
proc sql;
create table squareup as 
select * from
 (select distinct ID , Area , Area_desc , sex ,AgeGroup , AgeGroup_desc ,0 as  Visits,0 as Cases,0 as Amount
  from have),(select distinct year from have)
order by id,year;
quit;


 

data want ;
  merge squareup have;
  by id year Area  Area_desc  sex AgeGroup  AgeGroup_desc ;
run;

Ksharp

Ask a Question
Discussion stats
  • 6 replies
  • 285 views
  • 0 likes
  • 5 in conversation