Hello,
Hope someone can help with this coding data step. I've a data set for 3 years of data (2009-2011) and it's aggregated by ID (just a combination of area, sex and AgeGroup) for visits, cases and amount. However not all IDs have the data for each year and I need to insert a line for the missing years - with repeating info ID, area, sex, and age group and assigning 0 for the variables visits, cases, and amount. I think I need to use retain to do it but got stuck. So much appreciated for any help from the experts on the forum.
data data; input
ID $ Year Area $ Area_desc $ sex $ AgeGroup $ AgeGroup_desc $ Visits Cases Amount ;
datalines;
0100M 2009 01 Area1 M 00 00 2 2 109
0103M 2009 01 Area1 M 03 10-14 1 1 365
0103M 2010 01 Area1 M 03 10-14 1 1 365
0103M 2011 01 Area1 M 03 10-14 1 1 362
0104F 2011 01 Area1 F 04 15-19 1 1 13
0104M 2010 01 Area1 M 04 15-19 1 2 53
;
Desired output (I added a line
0100M 2009 01 Area1 M 00 00 2 2 109
0100M 2010 01 Area1 M 00 00 0 0 0
0100M 2011 01 Area1 M 00 00 0 0 0
0103M 2009 01 Area1 M 03 10-14 1 1 365
0103M 2010 01 Area1 M 03 10-14 1 1 365
0103M 2011 01 Area1 M 03 10-14 1 1 362
0104F 2009 01 Area1 F 04 15-19 0 0 0
0104F 2010 01 Area1 F 04 15-19 0 0 0
0104F 2011 01 Area1 F 04 15-19 1 1 13
0104M 2009 01 Area1 M 04 15-19 0 0 0
0104M 2010 01 Area1 M 04 15-19 1 2 53
0104M 2011 01 Area1 M 04 15-19 0 0 0
;
Build an empty frame with all values of YEAR and merge with the source data.
data have;
input
ID $ Year Area $ Area_desc $ sex $
AgeGroup $ AgeGroup_desc $ Visits Cases Amount
;
datalines;
0100M 2009 01 Area1 M 00 00 2 2 109
0103M 2009 01 Area1 M 03 10-14 1 1 365
0103M 2010 01 Area1 M 03 10-14 1 1 365
0103M 2011 01 Area1 M 03 10-14 1 1 362
0104F 2011 01 Area1 F 04 15-19 1 1 13
0104M 2010 01 Area1 M 04 15-19 1 2 53
;
data squareup ;
set have ;
by id ;
if first.id ;
visits=0;
cases=0;
amount=0;
do year = 2009 to 2011 ;
output;
end;
run;
data want ;
merge squareup have;
by id year;
run;
proc print; run;
Hi,
I have a question...
data squareup ;
set have ;
by id ;
<<<---------------------may be here?
if first.id ;
visits=0;
cases=0;
amount=0;
do year = 2009 to 2011 ; /*doesn't this DO statement have to be at the top??
output;
end;
run;
data want ;
merge squareup have;
by id year;
run;
Not really.
This program is using subsetting IF statement. The line right after your arrow is key as it is what subsets the data from the source to one observation per ID. Then the next three assignment statements force those three variables to be zero. Now we have the dummy data that we want to replicate for each of the three possible year values.
Just like Tom's but different.
Both are clever ways of doing it. Already applied to my data and it worked. Thanks much.
SQL version:
data have; input ID $ Year Area $ Area_desc $ sex $ AgeGroup $ AgeGroup_desc $ Visits Cases Amount ; datalines; 0100M 2009 01 Area1 M 00 00 2 2 109 0103M 2009 01 Area1 M 03 10-14 1 1 365 0103M 2010 01 Area1 M 03 10-14 1 1 365 0103M 2011 01 Area1 M 03 10-14 1 1 362 0104F 2011 01 Area1 F 04 15-19 1 1 13 0104M 2010 01 Area1 M 04 15-19 1 2 53 ; run; proc sql; create table squareup as select * from (select distinct ID , Area , Area_desc , sex ,AgeGroup , AgeGroup_desc ,0 as Visits,0 as Cases,0 as Amount from have),(select distinct year from have) order by id,year; quit; data want ; merge squareup have; by id year Area Area_desc sex AgeGroup AgeGroup_desc ; run;
Ksharp
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.