Hello,
The problem I have is the following: In my program, I want to create a new variable when a current variable has a missing value. This is what I already have:
data basetabel_imputed;
set basetablenum;
array wit{*} _numeric_;
do i=1 to dim(wit);
if wit(i)=. then do;
wit(i)= symget(vname(wit(i));
end;
end;
drop i;
run;
I read in the variables, make an array of the numeric ones and then, for every numeric variable, I change the missing value by the median of that variable. Now, in the second do loop, I want to create a new variable, that has the same name as the variable currently accessed by the do loop, except that I want to add the suffix mv to it. Then I want this new variable to take the value 1. What I want is in fact a statement that does:
vname(wit(i))||mv = 1;
(but his doensn't work because one cannot create a new variable name like that).
Matthijs
Cannot do it that way. You will need to use a little code generation.
You could query the SAS metadata to get a list of the variables and generate the new variable names.
proc sql noprint ;
select name, trim(name)||'mv'
into :vlist separted by ' '
, :vlist2 separated by ' '
from dictionary.columns
where libname='WORK'
and memname='BASETABLENUM'
and type='num'
;
quit;
data basetabel_imputed;
set basetablenum;
array wit &vlist;
array new &vlist2;
do i=1 to dim(wit);
new(i) = (wit(i) = .);
if new(i) then do;
wit(i)= symget(vname(wit(i));
end;
end;
drop i;
run;
Now I am confused about why you are using macro variables to transfer the medians. It is much better use SAS datasets to do that.
proc sql noprint ;
select name
, trim(name)||'mv'
, trim(name)||'median'
into :vlist separted by ' '
, :vlist2 separated by ' '
, :vlist3 separated by ' '
from dictionary.columns
where libname='WORK'
and memname='BASETABLENUM'
and type='num'
;
quit;
proc summary data=basetablenum ;
var &vlist ;
output out=basetable_medians median=&vlist3;
run;
data basetabel_imputed;
set basetablenum;
if _N_=1 then set basetable_medians;
array wit &vlist;
array new &vlist2;
array median &vlist3 ;
do i=1 to dim(wit);
new(i) = (wit(i) = .);
if new(i) then do;
wit(i)= median(i);
end;
end;
drop i &vlist3;
run;
Cannot do it that way. You will need to use a little code generation.
You could query the SAS metadata to get a list of the variables and generate the new variable names.
proc sql noprint ;
select name, trim(name)||'mv'
into :vlist separted by ' '
, :vlist2 separated by ' '
from dictionary.columns
where libname='WORK'
and memname='BASETABLENUM'
and type='num'
;
quit;
data basetabel_imputed;
set basetablenum;
array wit &vlist;
array new &vlist2;
do i=1 to dim(wit);
new(i) = (wit(i) = .);
if new(i) then do;
wit(i)= symget(vname(wit(i));
end;
end;
drop i;
run;
Now I am confused about why you are using macro variables to transfer the medians. It is much better use SAS datasets to do that.
proc sql noprint ;
select name
, trim(name)||'mv'
, trim(name)||'median'
into :vlist separted by ' '
, :vlist2 separated by ' '
, :vlist3 separated by ' '
from dictionary.columns
where libname='WORK'
and memname='BASETABLENUM'
and type='num'
;
quit;
proc summary data=basetablenum ;
var &vlist ;
output out=basetable_medians median=&vlist3;
run;
data basetabel_imputed;
set basetablenum;
if _N_=1 then set basetable_medians;
array wit &vlist;
array new &vlist2;
array median &vlist3 ;
do i=1 to dim(wit);
new(i) = (wit(i) = .);
if new(i) then do;
wit(i)= median(i);
end;
end;
drop i &vlist3;
run;
Thank you very much Tom!
In fact, this was the code that came before:
proc means data=basetablenum noprint;
var _numeric_;
output out=three (drop=_freq_ _type_)
median= /* / autoname */ ;
run;
data _NULL_;
set three;
array ass{*} _numeric_;
do i=1 to dim(ass);
call symput(vname(ass(i)),ass(i));
end;
drop i;
run;
I'm not a very experienced SAS-programmer, so I just took the median and then converted that to macro variables. I assume your code is much more efficient than mine, so thank you!
The problem with moving the data into macro variables and back is more that you could lose precision. Macro variables are just character strings and so will not match exactly the original floating point numbers.
Also if you use datasets then you can easily add grouping variables by using BY statements.
Hi ... here's another approach ... hopefully I understand the task ...
#1 replace missing numeric values with medians
#2 construct a new set of variables with 0/1 values with 1 indicating a value was replaced by the median
If I got the task correct, the only difference between your request about new variable names and what is
produced by the following is that the "MV" is used as a prefix (not a suffix) for the new variables (maybe
this is a better result since it allows you to refer to all those variables as a group using a wildcard, MV:) .
* make a data set with some missng values;
data class;
set sashelp.class;
if ranuni(987) le .5;
if ranuni(987) le .2 then
call missing(weight);
if ranuni(987) le .1 then
call missing(sex, age);
run;
Name Sex Age Height Weight
Alice . 56.5 84.0
Jane F 12 59.8 84.5
Janet F 15 62.5 112.5
John M 12 59.0 .
Joyce F 11 51.3 50.5
Judy F 14 64.3 90.0
Mary F 15 66.5 112.0
Philip M 16 72.0 .
Robert . 64.8 .
Thomas M 11 57.5 85.0
* use PROC STDIZE to replace missing values with medians, keep original values with new names (prefix MV);
proc stdize data=class out=_class reponly oprefix=mv method=median;
run;
* change MV variables to 0/1;
data _class;
set _class;
array mv(*) mv: ;
do _n_=1 to dim(mv);
mv(_n_) = missing(mv(_n_));
end;
run;
mv mv mv
Name Sex Age Height Weight Age Height Weight
Alice 1 0 0 13 56.5 84.0
Jane F 0 0 0 12 59.8 84.5
Janet F 0 0 0 15 62.5 112.5
John M 0 0 1 12 59.0 85.0
Joyce F 0 0 0 11 51.3 50.5
Judy F 0 0 0 14 64.3 90.0
Mary F 0 0 0 15 66.5 112.0
Philip M 0 0 1 16 72.0 85.0
Robert 1 0 1 13 64.8 85.0
Thomas M 0 0 0 11 57.5 85.0
You completely understand the question Mike. Wauw, this is a very short solution, thank you so much. (Just noting that it only works in sas 9.3, sas 9.2 doesn't seem to have the prefix options yet)
I think that you could still use Mike's solution in pre-9.3, you would just have to replace the prefix option with a three-way merge. E.g.:
data class;
set sashelp.class;
if ranuni(987) le .5;
if ranuni(987) le .2 then
call missing(weight);
if ranuni(987) le .1 then
call missing(sex, age);
run;
* use PROC STDIZE to replace missing values with medians, keep original values with new names (prefix MV);
proc stdize data=class out=_class reponly method=median;
run;
data _class (drop=orig:);
merge class (rename=(age=original_age
height=original_height
weight=original_weight))
_class(drop=sex rename=
(age=mv_age height=mv_height weight=mv_weight))
_class(drop=sex);
by name;
array mv(*) mv: ;
array original(*) orig:;
do _n_=1 to dim(mv);
mv(_n_) = missing(original(_n_));
end;
run;
Hi ... here's a 9.2 solution. It gives a warning in PROC SQL ...
WARNING: This CREATE TABLE statement recursively references the target table. A consequence of this is a possible data integrity problem.
so, you might want to use a new data set name in the CREATE TABLE portion (it does work with these data even with the warning).
data class;
set sashelp.class;
if ranuni(987) le .5;
if ranuni(987) le .2 then
call missing(weight);
if ranuni(987) le .1 then
call missing(sex, age);
run;
proc stdize data=class out=_class reponly method=median;
var _numeric_;
run;
proc sql noprint;
select catt(name,'=mv',name) into :rename separated by ' '
from dictionary.columns
where libname eq 'WORK' and memname eq 'CLASS' and type eq 'num';
create table _class as
select *
from _class natural join class (rename=(&rename));
quit;
* change MV variables to 0/1;
data _class;
set _class;
array mv(*) mv: ;
do _n_=1 to dim(mv);
mv(_n_) = missing(mv(_n_));
end;
run;
Or you can avoid the JOIN in SQL and just use a one-to-one merge ...
proc sql noprint;
select catt(name,'=mv',name) into :rename separated by ' '
from dictionary.columns
where libname eq 'WORK' and memname eq 'CLASS' and type eq 'num';
quit;
* change MV variables to 0/1;
data _class;
merge _class class (rename=(&rename));
array mv(*) mv: ;
do _n_=1 to dim(mv);
mv(_n_) = missing(mv(_n_));
end;
run;
matthijs1
it seems a little redundant to use a whole numeric variable to hold a 0/1 flag (indicating which missing var. has been replaced with a median).
If you used a string with a character for each (numeric) variable, just set "1" for each var
The following code adapts your original,
%let n_numerics = 200 ; * or any other number that is sufficient (under 32768) ;
data basetabel_imputed ;
length median_indicators $&n_numerics ;
array witM(&n_numerics) _temporary_ ;
array wit{*} _numeric_;
if missing( witM(1) ) then do ;
*** collect medians from proc means output ;
set three ;
do i=1 to dim( wit) ;
witM(i) = wit(i) ;
end ;
end ;
set basetablenum;
array wit{*} _numeric_;
do i=1 to dim(wit);
if wit(i)=. then do;
wit(i)= witM(i);
substr( median_indicators, i, i ) = '1' ;
end;
end;
drop i;
run;
Then you have applied the medians and each row has a string with a 1 for each numeric with a median applied. When blank, no medians were applied.
Should you need to replace that 200 constant with the actual number of numeric variables, it can be obtained with just a llittle more code: My preference is:
proc contents data= your.dataset noprint out=_data_ ; run ;
proc sql noprint ;
%let n_numerics = 0 ;
select count(*) into :n_numerics separated by ' '
from &syslast where type=1 /*numeric*/
;
%put NOTE: there are &n_numerics numeric variables ;
quit ;
Hi ... here's another way to make that vector of median substitution indicators (from one of
Howard Schreier's SAS-L postings on the use of _FILE_) rather than a group of new variables.
(_FILE_ provides a way to concatenate formatted values of variables).
* make a data set with some missng values;
data class;
set sashelp.class;
if ranuni(987) le .5;
if ranuni(987) le .2 then
call missing(weight);
if ranuni(987) le .1 then
call missing(sex, age);
run;
proc format;
value fix low-high = '0' other = '1';
run;
filename nosee dummy;
data class;
file nosee;
set class;
put (_numeric_) (fix.) @;
median_indicators = _file_;
put;
run;
proc stdize data=class out=_class reponly method=median;
run;
median_
Name Sex Age Height Weight indicators
Alice 13 56.5 84.0 100
Jane F 12 59.8 84.5 000
Janet F 15 62.5 112.5 000
John M 12 59.0 85.0 001
Joyce F 11 51.3 50.5 000
Judy F 14 64.3 90.0 000
Mary F 15 66.5 112.0 000
Philip M 16 72.0 85.0 001
Robert 13 64.8 85.0 101
Thomas M 11 57.5 85.0 000
as usual with Howard's stuff, it's a really neat idea!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.