Hi I have a dataset "Results" that has the cost for each ID per month, along with some other variables (Class1, Class 2, OIC)
Because it is possible to have no cost for a particular month, I want to to fill in these missing rows for each ID, based on another table, "List" which is all of the months already passed for this financial year (starts at April 2019, up to present, January 2020). For the rest of the variables, I just want to retain the values from the row above.
If it is also possible to just start at the initial month in the original data, for example, ID 0002232
example should not have Apr-Aug data because data for it only started at Sep.
data Results;
input ID $ Class1 $ Class2 $ OIC $ Base_YM $ Cost;
datalines;
0901708 JP J Amanda 201904 56
0901708 JP J Amanda 201905 84
0901708 JP J Amanda 201906 26
0901708 JP J Amanda 201908 94
0901708 JP J Amanda 201910 47
0002232 NJP NJ Steve 201909 56
0002232 NJP NJ Steve 201911 84
5548763 NJP NJ Mark 201907 26
5548763 NJP NJ Mark 201908 94
5548763 NJP NJ Mark 201909 47
;
run;
data List;
input Base_YM $;
datalines;
201904
201905
201906
201907
201908
201909
201910
201911
201912
202001
;
run;
data want;
input ID $ Class1 $ Class 2 $ OIC $ Base_YM $ Cost;
datalines;
0901708 JP J Amanda 201904 56
0901708 JP J Amanda 201905 84
0901708 JP J Amanda 201906 26
0901708 JP J Amanda 201907 26
0901708 JP J Amanda 201908 94
0901708 JP J Amanda 201909 94
0901708 JP J Amanda 201910 47
0901708 JP J Amanda 201911 47
0901708 JP J Amanda 201912 47
0901708 JP J Amanda 202001 47
0002232 NJP NJ Steve 201909 56
0002232 NJP NJ Steve 201910 56
0002232 NJP NJ Steve 201911 84
0002232 NJP NJ Steve 201912 84
0002232 NJP NJ Steve 202001 84
5548763 NJP NJ Mark 201907 26
5548763 NJP NJ Mark 201908 94
5548763 NJP NJ Mark 201909 47
5548763 NJP NJ Mark 201910 47
5548763 NJP NJ Mark 201911 47
5548763 NJP NJ Mark 201912 47
5548763 NJP NJ Mark 202001 47
;
run;
Below code should do the job.
Please note that I'm using informat yymmn6. for reading the source data into variable Base_YM. This converts the source string into a SAS Date value.
data List;
input Base_YM :yymmn6.;
format Base_YM yymmn6.;
datalines;
201904
201905
201906
201907
201908
201909
201910
201911
201912
202001
;
data Results;
input ID $ Class1 $ Class2 $ OIC $ Base_YM :yymmn6. Cost;
format Base_YM yymmn6.;
datalines;
0901708 JP J Amanda 201904 56
0901708 JP J Amanda 201905 84
0901708 JP J Amanda 201906 26
0901708 JP J Amanda 201908 94
0901708 JP J Amanda 201910 47
0002232 NJP NJ Steve 201909 56
0002232 NJP NJ Steve 201911 84
5548763 NJP NJ Mark 201907 26
5548763 NJP NJ Mark 201908 94
5548763 NJP NJ Mark 201909 47
;
proc sort data=results;
by id Base_YM;
run;
proc sql noprint;
select put(max(Base_YM),32.) into :max_base_ym trimmed
from list
;
quit;
%put &=max_base_ym is SAS Date value for yyyymm: %sysfunc(putn(&max_base_ym,yymmn6.));
data want(drop=_:);
set results;
by id;
output;
if not last.id then
do;
_n_+1;
set results(keep=base_ym rename=(base_ym=_base_ym)) point=_n_;
_cnt= intck('month',base_ym,_base_ym)-1;
do _i=1 to _cnt;
base_ym=intnx('month',base_ym,_i,'b');
output;
end;
end;
else
if last.id then
do;
_cnt= intck('month',base_ym,&max_base_ym);
do _i=1 to _cnt;
base_ym=intnx('month',base_ym,1,'b');
output;
end;
end;
run;
proc print;
run;
Thank you @Patrick! I modified it slightly after realising I need to set a zero value for cost, so as not to mess up the cumulative values in my next step.
Did this for the do blocks that add a new row:
do _i=1 to _cnt;
base_ym=intnx('month',base_ym,_i,'b');
Cost=0; /*****/
output;
end;
Try the scheme below. Note:
data results ;
input (id class1 class2 oic) ($) base_ym cost ;
cards ;
0901708 jp j amanda 201904 56
0901708 jp j amanda 201905 84
0901708 jp j amanda 201906 26
0901708 jp j amanda 201908 94
0901708 jp j amanda 201910 47
0002232 njp nj steve 201909 56
0002232 njp nj steve 201911 84
5548763 njp nj mark 201907 26
5548763 njp nj mark 201908 94
5548763 njp nj mark 201909 47
;
data list ;
input base_ym ;
cards ;
201904
201905
201906
201907
201908
201909
201910
201911
201912
202001
;
data want (drop = _:) ;
dcl hash h (dataset:"list", ordered:"a") ;
h.definekey ("base_ym") ;
h.definedone () ;
dcl hiter i ("h") ;
i.last() ;
_bl = base_ym ;
do p = 2 by 1 until (z) ;
set results end = z ;
by id notsorted ;
if not last.id then set results (keep=base_ym rename=base_ym=_bm) point = p ;
i.setcur() ;
output ;
do while (i.next() = 0) ;
if (last.id and base_ym <= _bl) or base_ym < _bm then output ;
end ;
end ;
run ;
Kind regards
Paul D.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.