I have a long file with data like this:
Some projids have one or more of the 13 rows missing, but it's random as to which row(s) are missing. What is the easiest way to write code to add the rows that are missing for each projid?
Thanks!!
This program has two data steps, but only demands the original data from storage be retrieved once, because it uses a data set view.
proc format;
value timept 0="Baseline" 1="1 year" 2="2 year" 3="3 year"
4="4 year" 5="5 year" 6="6 year" 7="7 year" 8="8 year"
9="9 year" 10="10 year" 11="11 year" 12="12 year" 13="13 year"
14="14 year" 15="15 year" 16="16 year" 17="17 year"
18="18 year" 19="19 year";
data years;
input id time var1;
format time timept. ;
cards;
12345 0 0.089
12345 1 0.099
12345 3 0.0762
12345 5 0.103
12345 6 0.089
12345 7 0.089
12345 8 0.089
12345 9 0.089
12345 10 0.089
12345 11 0.089
run;
data all_time_values/view=all_time_values ;
set years (keep=id);
by id;
if first.id;
do time=0 to 13; /* Baseline to Year 13 */
output;
end;
run;
data want ;
merge all_time_values years;
by id time;
run;
This assumes the data are sorted by ID and time.
How many obs should each projid have? Is the number constant?
What is the real contents of "time"? Can you show the unformatted values?
Can you please post data in usable form, so that we have something to work with?
Usable form = data step with datalines.
Here is my code that works on fake data. This could be extremely slow on large data sets. But really, we need answers to all the questions from @andreas_lds
data years;
input id time var1;
cards;
12345 1 0.089
12345 2 0.099
12345 3 0.0762
12345 5 0.103
12345 6 0.089
12345 7 0.089
12345 8 0.089
12345 9 0.089
12345 10 0.089
12345 11 0.089
;
data thirteen;
do time=1 to 13;
output;
end;
run;
proc sql;
create table ids as select distinct id from years;
create table all_years as select distinct a.id,b.time from years as a,thirteen as b; /* Cartesian join */
create table want as select a.id,a.time,b.var1 from all_years as a left join years as b
on a.id=b.id and a.time=b.time;
quit;
Each projid should have 13 observations/rows. In the example I show in the screenshot, the original data source does not include a row for observation 10 (for 10 years). I need to get a blank row added that would be for observation 10 (10 years).
In another example, the projid had 2 observations (baseline and 1 year), so in this case I need to add the blank rows for years 3 through 13.
This code shows how the time variable is formatted.
value timept 0="Baseline" 1="1 year" 2="2 year" 3="3 year" 4="4 year" 5="5 year" 6="6 year" 7="7 year" 8="8 year" 9="9 year" 10="10 year" 11="11 year" 12="12 year" 13="13 year" 14="14 year" 15="15 year" 16="16 year" 17="17 year" 18="18 year" 19="19 year";
Thanks!
This will fill missing rows at the end and in the middle.
data have;
do projid='28533', '29344';
do year = 0 to 3, 5;
y = ranuni(1234);
output;
end;
end;
run;
proc format;
value xyear(notsorted)
0 = 'Baseline'
1 = '1 Year'
2 = '2 Year'
3 = '3 Year'
4 = '4 Year'
5 = '5 Year'
6 = '6 Year'
7 = '7 Year'
;
quit;
proc summary data=have nway missing completetypes;
by projid;
class year / preloadfmt;
format year xyear.;
output out=expanded(drop=_type_);
run;
data want;
merge expanded have;
by projid year;
run;
proc print;
run;
This program has two data steps, but only demands the original data from storage be retrieved once, because it uses a data set view.
proc format;
value timept 0="Baseline" 1="1 year" 2="2 year" 3="3 year"
4="4 year" 5="5 year" 6="6 year" 7="7 year" 8="8 year"
9="9 year" 10="10 year" 11="11 year" 12="12 year" 13="13 year"
14="14 year" 15="15 year" 16="16 year" 17="17 year"
18="18 year" 19="19 year";
data years;
input id time var1;
format time timept. ;
cards;
12345 0 0.089
12345 1 0.099
12345 3 0.0762
12345 5 0.103
12345 6 0.089
12345 7 0.089
12345 8 0.089
12345 9 0.089
12345 10 0.089
12345 11 0.089
run;
data all_time_values/view=all_time_values ;
set years (keep=id);
by id;
if first.id;
do time=0 to 13; /* Baseline to Year 13 */
output;
end;
run;
data want ;
merge all_time_values years;
by id time;
run;
This assumes the data are sorted by ID and time.
This worked beautifully! Thank you!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.