say I have something like:
data have;
infile datalines delimiter=' ';
input group count n;
datalines;
9 1 400
9 2 297
9 3 344
9 4 32
9 5 45
9 6 6
9 7 54
9 8 34
9 9 21
10 1 32
10 2 3
10 3 57
10 4 43
10 5 45
10 6 45
10 7 34
10 8 32
10 11 5
;
but for group 10 I need to fill in missing count values per group, so I want count=9 and 10, while for n just fill in 0. Like below:
data have;
infile datalines delimiter=' ';
input group count n;
datalines;
9 1 400
9 2 297
9 3 344
9 4 32
9 5 45
9 6 6
9 7 54
9 8 34
9 9 21
10 1 32
10 2 3
10 3 57
10 4 43
10 5 45
10 6 45
10 7 34
10 8 32
10 9 0
10 10 0
10 11 5
;
I'm thinking there has to be a complete function or similar. Help is appreciated!!
Hi @richart I believe there might be some inbuilt procedures within the SAS ETS module should you have time series dates in your real data to fill. Of course, otherwise you can always rely on a datastep-
data have;
infile datalines delimiter=' ';
input group count n;
datalines;
9 1 400
9 2 297
9 3 344
9 4 32
9 5 45
9 6 6
9 7 54
9 8 34
9 9 21
10 1 32
10 2 3
10 3 57
10 4 43
10 5 45
10 6 45
10 7 34
10 8 32
10 11 5
;
data want ;
do until (last.group) ;
set have curobs = _n ;
by group ;
if _c and dif(count) > 1 then do ;
do count = _c + 1 to count - 1 ;
n = 0 ;
output ;
end ;
set have point = _n ;
output ;
end ;
else output ;
_c =count ;
end ;
drop _c ;
run ;
Hi @richart I believe there might be some inbuilt procedures within the SAS ETS module should you have time series dates in your real data to fill. Of course, otherwise you can always rely on a datastep-
data have;
infile datalines delimiter=' ';
input group count n;
datalines;
9 1 400
9 2 297
9 3 344
9 4 32
9 5 45
9 6 6
9 7 54
9 8 34
9 9 21
10 1 32
10 2 3
10 3 57
10 4 43
10 5 45
10 6 45
10 7 34
10 8 32
10 11 5
;
data want ;
do until (last.group) ;
set have curobs = _n ;
by group ;
if _c and dif(count) > 1 then do ;
do count = _c + 1 to count - 1 ;
n = 0 ;
output ;
end ;
set have point = _n ;
output ;
end ;
else output ;
_c =count ;
end ;
drop _c ;
run ;
Hello @richart ,
The SAS/ETS way as @novinosrin is suggesting :
PROC TIMESERIES data=have out=want;
by group;
id count interval=day
accumulate=median
setmiss=0 ;
var n;
format count 5.;
run;
/* end of program */
Koen
Assuming that max possible value of count is known, you can do something like:
data want;
array temp[100] _temporary_ ; /* set it big enough */
set have;
by group;
temp[count] = n;
_max_ = max(count, _max_);
if last.group then
do;
do count = 1 to _max_;
n = coalesce(temp[count], 0);
output;
end;
call missing(of temp[*], _max_);
end;
drop _max_;
run;
proc print;
run;
Bart
Here another coding variant that should work.
data want;
set have;
by group count;
output;
_lag_count=coalesce(lag(count),1);
do while(coalesce(count,1) - _lag_count > 1);
count=count-1;
n=0;
output;
end;
drop _lag_count;
run;
proc sort data=want;
by group count;
run;
data have;
infile datalines delimiter=' ';
input group count n;
datalines;
9 1 400
9 2 297
9 3 344
9 4 32
9 5 45
9 6 6
9 7 54
9 8 34
9 9 21
10 1 32
10 2 3
10 3 57
10 4 43
10 5 45
10 6 45
10 7 34
10 8 32
10 11 5
;
data want;
merge have have(firstobs=2 keep=group count rename=(group=_group count=_count));
output;
if group=_group then do;
do count=count+1 to _count-1;
n=0;output;
end;
end;
drop _group _count;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.