I am trying to expand some data by the second to fill missing values. However, the input data has a couple of character variables that do not survive the procedure. It drops everything but the numeric variables. I would like to keep the character variables using the same step method.
Any suggestions would be greatly appreciated.
proc expand data=input out=output to=second method=step;
id TIME;
run;
It makes sense that proc expand won't propagate character vars, since (unlike step) most methods intrinsically require a numeric value as input to the various SPLINE, JOIN, AGGREGATE methods. You could recode the character values into numeric prior to EXPAND, but otherwise you'll have to run a DATA step merging INPUT character vars with OUTPUT.
Since you're using STEP, you probably want LOCF (last observation carried forward) for the character variables. This should work (where CVAR1 CVAR2 are names of character variables 😞
data want;
merge output input (keep=time cvar1 cvar2);
by time;
array cvars {*} $200 cvar1 cvar2;
array locf {2} $200 _temporary_;
do _N_=1 to dim(cvars);
locf{_N_} = coalescec(cvars{_N_},locf{_N_});
cvars{_N_} = locf{_N_};
end;
run;
BTW, if you don't know in advance the number of variables in the CVARS, but it's never more than, say, 100, you could just declare the LOCF array to have 100 elements. Also, this program doesn't require all the character vars to have the same length, but it does assume none are longer than 200 byters.
It makes sense that proc expand won't propagate character vars, since (unlike step) most methods intrinsically require a numeric value as input to the various SPLINE, JOIN, AGGREGATE methods. You could recode the character values into numeric prior to EXPAND, but otherwise you'll have to run a DATA step merging INPUT character vars with OUTPUT.
Since you're using STEP, you probably want LOCF (last observation carried forward) for the character variables. This should work (where CVAR1 CVAR2 are names of character variables 😞
data want;
merge output input (keep=time cvar1 cvar2);
by time;
array cvars {*} $200 cvar1 cvar2;
array locf {2} $200 _temporary_;
do _N_=1 to dim(cvars);
locf{_N_} = coalescec(cvars{_N_},locf{_N_});
cvars{_N_} = locf{_N_};
end;
run;
BTW, if you don't know in advance the number of variables in the CVARS, but it's never more than, say, 100, you could just declare the LOCF array to have 100 elements. Also, this program doesn't require all the character vars to have the same length, but it does assume none are longer than 200 byters.
That worked perfectly! Thank you so much for your help.
For future reference if anyone else uses this, the 2 in locf{2} should be changed to match the number of cvars you have.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.