BookmarkSubscribeRSS Feed
Solph
Pyrite | Level 9

Hello,

Hope someone can help with this coding data step. I've a data set for 3 years of data (2009-2011) and it's aggregated by ID (just a combination of area, sex and AgeGroup) for visits, cases and amount. However not all IDs have the data for each year and I need to insert a line for the missing years - with repeating info ID, area, sex, and age group and assigning 0 for the variables visits, cases, and amount. I think I need to use retain to do it but got stuck. So much appreciated for any help from the experts on the forum.

data data; input

ID $ Year Area $ Area_desc $ sex $ AgeGroup $ AgeGroup_desc $ Visits Cases Amount ;

datalines;

0100M 2009 01 Area1 M 00 00     2 2 109

0103M 2009 01 Area1 M 03 10-14     1 1 365

0103M 2010 01 Area1 M 03 10-14     1 1 365

0103M 2011 01 Area1 M 03 10-14     1 1 362

0104F 2011 01 Area1 F 04 15-19     1 1 13

0104M 2010 01 Area1 M 04 15-19     1 2 53

;

Desired output (I added a line

0100M 2009 01 Area1 M 00 00     2 2 109

0100M 2010 01 Area1 M 00 00     0 0  0

0100M 2011 01 Area1 M 00 00     0 0  0

0103M 2009 01 Area1 M 03 10-14     1 1 365

0103M 2010 01 Area1 M 03 10-14     1 1 365

0103M 2011 01 Area1 M 03 10-14     1 1 362

0104F 2009 01 Area1 F 04 15-19     0 0 0

0104F 2010 01 Area1 F 04 15-19     0 0 0

0104F 2011 01 Area1 F 04 15-19     1 1 13

0104M 2009 01 Area1 M 04 15-19     0 0 0

0104M 2010 01 Area1 M 04 15-19     1 2 53

0104M 2011 01 Area1 M 04 15-19     0 0 0

;

6 REPLIES 6
Tom
Super User Tom
Super User

Build an empty frame with all values of YEAR and merge with the source data.

data have;

input

   ID $ Year Area $ Area_desc $ sex $

   AgeGroup $ AgeGroup_desc $ Visits Cases Amount

;

datalines;

0100M 2009 01 Area1 M 00 00     2 2 109

0103M 2009 01 Area1 M 03 10-14     1 1 365

0103M 2010 01 Area1 M 03 10-14     1 1 365

0103M 2011 01 Area1 M 03 10-14     1 1 362

0104F 2011 01 Area1 F 04 15-19     1 1 13

0104M 2010 01 Area1 M 04 15-19     1 2 53

;

data squareup ;

  set have ;

  by id ;

  if first.id ;

  visits=0;

  cases=0;

  amount=0;

  do year = 2009 to 2011 ;

    output;

  end;

run;

data want ;

  merge squareup have;

  by id year;

run;


proc print; run;

robertrao
Quartz | Level 8

Hi,

I have a question...

data squareup ;

  set have ;

  by id ;

    <<<---------------------may be here?

  if first.id ;

  visits=0;

  cases=0;

  amount=0;

  do year = 2009 to 2011 ;   /*doesn't this DO statement have to be at the top??

    output;

  end;

run;

data want ;

  merge squareup have;

  by id year;

run;

Tom
Super User Tom
Super User

Not really.

This program is using subsetting IF statement. The line right after your arrow is key as it is what subsets the data from the source to one observation per ID.  Then the next three assignment statements force those three variables to be zero.  Now we have the dummy data that we want to replicate for each of the three possible year values.

data_null__
Jade | Level 19

Just like Tom's but different.

data data;
  
input ID $ Year Area $ Area_desc $ sex $ AgeGroup $ AgeGroup_desc $ Visits Cases Amount;
datalines;
0100M 2009 01 Area1 M 00 00        2 2 109
0103M 2009 01 Area1 M 03 10-14     1 1 365
0103M 2010 01 Area1 M 03 10-14     1 1 365
0103M 2011 01 Area1 M 03 10-14     1 1 362
0104F 2011 01 Area1 F 04 15-19     1 1 13
0104M 2010 01 Area1 M 04 15-19     1 2 53
;;;;
   run;
data frame;
   set data(drop=visits--amount);
   by id;
   if first.id;
   if 0 then set data(keep=visits--amount);
   retain visits--amount 0;
  
do year = 2009 to 2011;
     
output;
     
end;
  
run;
data data2;
   update frame data(keep=id year visits--amount);
   by id year;
   run;
Solph
Pyrite | Level 9

Both are clever ways of doing it. Already applied to my data and it worked. Thanks much.

Ksharp
Super User

SQL version:

data have;
input
   ID $ Year Area $ Area_desc $ sex $
   AgeGroup $ AgeGroup_desc $ Visits Cases Amount
;
datalines;
0100M 2009 01 Area1 M 00 00     2 2 109
0103M 2009 01 Area1 M 03 10-14     1 1 365
0103M 2010 01 Area1 M 03 10-14     1 1 365
0103M 2011 01 Area1 M 03 10-14     1 1 362
0104F 2011 01 Area1 F 04 15-19     1 1 13
0104M 2010 01 Area1 M 04 15-19     1 2 53
;
run;

 
proc sql;
create table squareup as 
select * from
 (select distinct ID , Area , Area_desc , sex ,AgeGroup , AgeGroup_desc ,0 as  Visits,0 as Cases,0 as Amount
  from have),(select distinct year from have)
order by id,year;
quit;


 

data want ;
  merge squareup have;
  by id year Area  Area_desc  sex AgeGroup  AgeGroup_desc ;
run;

Ksharp

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 888 views
  • 0 likes
  • 5 in conversation