BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
matthijs1
Calcite | Level 5

Hello,

The problem I have is the following: In my program, I want to create a new variable when a current variable has a missing value. This is what I already have:

data basetabel_imputed;

set basetablenum;

array wit{*} _numeric_;

do i=1 to dim(wit);

    if wit(i)=. then do;

    wit(i)= symget(vname(wit(i));

    end;

end;

drop i;

run;

I read in the variables, make an array of the numeric ones and then, for every numeric variable, I change the missing value by the median of that variable. Now, in the second do loop, I want to create a new variable, that has the same name as the variable currently accessed by the do loop, except that I want to add the suffix mv to it. Then I want this new variable to take the value 1. What I want is in fact a statement that does:

vname(wit(i))||mv = 1;

(but his doensn't work because one cannot create a new variable name like that).

Matthijs

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Cannot do it that way.  You will need to use a little code generation.

You could query the SAS metadata to get a list of the variables and generate the new variable names.

proc sql noprint ;

  select name, trim(name)||'mv'

     into :vlist separted by ' '

        , :vlist2 separated by ' '

   from dictionary.columns

   where libname='WORK'

     and memname='BASETABLENUM'

     and type='num'

  ;

quit;

data basetabel_imputed;

  set basetablenum;

  array wit &vlist;

  array new &vlist2;

  do i=1 to dim(wit);

    new(i) = (wit(i) = .);

    if new(i) then do;

      wit(i)= symget(vname(wit(i));

    end;

  end;

  drop i;

run;

Now I am confused about why you are using macro variables to transfer the medians.  It is much better use SAS datasets to do that.

proc sql noprint ;

  select name

       , trim(name)||'mv'

       , trim(name)||'median'

    into :vlist separted by ' '

       , :vlist2 separated by ' '

       , :vlist3 separated by ' '

   from dictionary.columns

   where libname='WORK'

     and memname='BASETABLENUM'

     and type='num'

  ;

quit;

proc summary data=basetablenum ;

  var &vlist ;

  output out=basetable_medians median=&vlist3;

run;

data basetabel_imputed;

  set basetablenum;

  if _N_=1 then set basetable_medians;

  array wit &vlist;

  array new &vlist2;

  array median &vlist3 ;

  do i=1 to dim(wit);

    new(i) = (wit(i) = .);

    if new(i) then do;

      wit(i)= median(i);

    end;

  end;

  drop i &vlist3;

run;

View solution in original post

10 REPLIES 10
Tom
Super User Tom
Super User

Cannot do it that way.  You will need to use a little code generation.

You could query the SAS metadata to get a list of the variables and generate the new variable names.

proc sql noprint ;

  select name, trim(name)||'mv'

     into :vlist separted by ' '

        , :vlist2 separated by ' '

   from dictionary.columns

   where libname='WORK'

     and memname='BASETABLENUM'

     and type='num'

  ;

quit;

data basetabel_imputed;

  set basetablenum;

  array wit &vlist;

  array new &vlist2;

  do i=1 to dim(wit);

    new(i) = (wit(i) = .);

    if new(i) then do;

      wit(i)= symget(vname(wit(i));

    end;

  end;

  drop i;

run;

Now I am confused about why you are using macro variables to transfer the medians.  It is much better use SAS datasets to do that.

proc sql noprint ;

  select name

       , trim(name)||'mv'

       , trim(name)||'median'

    into :vlist separted by ' '

       , :vlist2 separated by ' '

       , :vlist3 separated by ' '

   from dictionary.columns

   where libname='WORK'

     and memname='BASETABLENUM'

     and type='num'

  ;

quit;

proc summary data=basetablenum ;

  var &vlist ;

  output out=basetable_medians median=&vlist3;

run;

data basetabel_imputed;

  set basetablenum;

  if _N_=1 then set basetable_medians;

  array wit &vlist;

  array new &vlist2;

  array median &vlist3 ;

  do i=1 to dim(wit);

    new(i) = (wit(i) = .);

    if new(i) then do;

      wit(i)= median(i);

    end;

  end;

  drop i &vlist3;

run;

matthijs1
Calcite | Level 5

Thank you very much Tom!

In fact, this was the code that came before:

proc means data=basetablenum noprint;

var _numeric_;

output out=three (drop=_freq_ _type_)

median= /* / autoname */ ;

run;

data _NULL_;

set three;

array ass{*} _numeric_;

do i=1 to dim(ass);

      call symput(vname(ass(i)),ass(i));

end;

drop i;

run;

I'm not a very experienced SAS-programmer, so I just took the median and then converted that to macro variables. I assume your code is much more efficient than mine, so thank you!

Tom
Super User Tom
Super User

The problem with moving the data into macro variables and back is more that you could lose precision.  Macro variables are just character strings and so will not match exactly the original floating point numbers.

Also if you use datasets then you can easily add grouping variables by using BY statements.

MikeZdeb
Rhodochrosite | Level 12

Hi ... here's another approach ... hopefully I understand the task ...

#1  replace missing numeric values with medians

#2  construct a new set of variables with 0/1 values with 1 indicating a value was replaced by the median

If I got the task correct, the only difference between your request about new variable names and what is

produced by the following is that the "MV" is used as a prefix (not a suffix) for the new variables (maybe

this is a better result since it allows you to refer to all those variables as a group using a wildcard, MV:) .

* make a data set with some missng values;

data class;

set sashelp.class;

if ranuni(987) le .5;

if ranuni(987) le .2 then

   call missing(weight);

if ranuni(987) le .1 then

   call missing(sex, age);

run;

Name     Sex    Age    Height    Weight

Alice              .     56.5       84.0

Jane       F      12     59.8       84.5

Janet      F      15     62.5      112.5

John       M      12     59.0         .

Joyce      F      11     51.3       50.5

Judy       F      14     64.3       90.0

Mary       F      15     66.5      112.0

Philip     M      16     72.0         .

Robert             .     64.8         .

Thomas     M      11     57.5       85.0

* use PROC STDIZE to replace missing values with medians, keep original values with new names (prefix MV);

proc stdize data=class out=_class reponly oprefix=mv method=median;

run;

* change MV variables to 0/1;

data _class;

set _class;

array mv(*) mv: ;

do _n_=1 to dim(mv);

   mv(_n_) = missing(mv(_n_));

end;

run;

                  mv      mv        mv

Name     Sex    Age    Height    Weight    Age    Height    Weight

Alice             1        0         0       13     56.5       84.0

Jane       F      0        0         0       12     59.8       84.5

Janet      F      0        0         0       15     62.5      112.5

John       M      0        0         1       12     59.0       85.0

Joyce      F      0        0         0       11     51.3       50.5

Judy       F      0        0         0       14     64.3       90.0

Mary       F      0        0         0       15     66.5      112.0

Philip     M      0        0         1       16     72.0       85.0

Robert            1        0         1       13     64.8       85.0

Thomas     M      0        0         0       11     57.5       85.0

matthijs1
Calcite | Level 5

You completely understand the question Mike. Wauw, this is a very short solution, thank you so much. (Just noting that it only works in sas 9.3, sas 9.2 doesn't seem to have the prefix options yet)

art297
Opal | Level 21

I think that you could still use Mike's solution in pre-9.3, you would just have to replace the prefix option with a three-way merge.  E.g.:

data class;

  set sashelp.class;

  if ranuni(987) le .5;

  if ranuni(987) le .2 then

   call missing(weight);

  if ranuni(987) le .1 then

   call missing(sex, age);

run;

* use PROC STDIZE to replace missing values with medians, keep original values with new names (prefix MV);

proc stdize data=class out=_class reponly method=median;

run;

data _class (drop=orig:);

  merge class (rename=(age=original_age

               height=original_height

               weight=original_weight))

    _class(drop=sex rename=

     (age=mv_age height=mv_height weight=mv_weight))

    _class(drop=sex);

  by name;

  array mv(*) mv: ;

  array original(*) orig:;

  do _n_=1 to dim(mv);

    mv(_n_) = missing(original(_n_));

  end;

run;

MikeZdeb
Rhodochrosite | Level 12

Hi ... here's a 9.2 solution.  It gives a warning in PROC SQL ...

WARNING: This CREATE TABLE statement recursively references the target table. A consequence of this is a possible data integrity problem.

so, you might want to use a new data set name in the CREATE TABLE portion (it does work with these data even with the warning).

data class;

set sashelp.class;

if ranuni(987) le .5;

if ranuni(987) le .2 then

   call missing(weight);

if ranuni(987) le .1 then

   call missing(sex, age);

run;

proc stdize data=class out=_class reponly method=median;

var _numeric_;

run;

proc sql noprint;

select catt(name,'=mv',name) into :rename separated by ' '

from dictionary.columns

where libname eq 'WORK' and memname eq 'CLASS' and type eq 'num';

create table _class as

select *

from _class natural join class (rename=(&rename));

quit;

* change MV variables to 0/1;

data _class;

set _class;

array mv(*) mv: ;

do _n_=1 to dim(mv);

   mv(_n_) = missing(mv(_n_));

end;

run;

Or you can avoid the JOIN in SQL and just use a one-to-one merge ...

proc sql noprint;

select catt(name,'=mv',name) into :rename separated by ' '

from dictionary.columns

where libname eq 'WORK' and memname eq 'CLASS' and type eq 'num';

quit;

* change MV variables to 0/1;

data _class;

merge _class class (rename=(&rename));

array mv(*) mv: ;

do _n_=1 to dim(mv);

   mv(_n_) = missing(mv(_n_));

end;

run;


Peter_C
Rhodochrosite | Level 12

matthijs1

it seems a little redundant to use a whole numeric variable to hold a 0/1 flag (indicating which missing var. has been replaced with a median).

If you used a string with a character for each (numeric) variable, just set "1" for each var

The following code adapts your original,

  1. to load the median values (when not already loaded
  2. to assign a 1 to the corresponding character in median_indicators  ;

%let n_numerics = 200 ;  * or any other number that is sufficient (under 32768) ;

data basetabel_imputed ;  

   length median_indicators $&n_numerics ;

   array witM(&n_numerics) _temporary_ ;

   array wit{*} _numeric_;

   if missing( witM(1) ) then do ;

      *** collect medians from proc means output ;

      set three ;

      do i=1 to dim( wit) ;

         witM(i) = wit(i) ;

      end ;

   end ;

   set basetablenum;

   array wit{*} _numeric_;

   do i=1 to dim(wit);

      if wit(i)=. then do;

         wit(i)= witM(i);

         substr( median_indicators, i, i ) = '1' ;

      end;

   end;

   drop i;

run;

Then you have applied the medians and each row has a string with a 1 for each numeric with a median applied. When blank, no medians were applied.

Should you need to replace that 200 constant with the actual number of numeric variables, it can be obtained with just a llittle more code: My preference is:

proc contents data= your.dataset noprint out=_data_ ; run ;

proc sql noprint ;

%let n_numerics = 0 ;

   select count(*) into :n_numerics separated by ' '

     from &syslast where type=1 /*numeric*/

;

%put NOTE: there are &n_numerics numeric variables ;

quit ;

MikeZdeb
Rhodochrosite | Level 12

Hi ... here's another way to make that vector of median substitution indicators (from one of

Howard Schreier's SAS-L postings on the use of _FILE_) rather than a group of new variables.

(_FILE_ provides a way to concatenate formatted values of variables).

* make a data set with some missng values;

data class;

set sashelp.class;

if ranuni(987) le .5;

if ranuni(987) le .2 then

   call missing(weight);

if ranuni(987) le .1 then

   call missing(sex, age);

run;

proc format;

value fix low-high = '0' other = '1';

run;

filename nosee dummy;

data class;

file nosee;

set class;

put (_numeric_) (fix.) @;

median_indicators = _file_;

put;

run;

proc stdize data=class out=_class reponly method=median;

run;

                                            median_

Name     Sex    Age    Height    Weight    indicators

Alice             13     56.5       84.0       100

Jane       F      12     59.8       84.5       000

Janet      F      15     62.5      112.5       000

John       M      12     59.0       85.0       001

Joyce      F      11     51.3       50.5       000

Judy       F      14     64.3       90.0       000

Mary       F      15     66.5      112.0       000

Philip     M      16     72.0       85.0       001

Robert            13     64.8       85.0       101

Thomas     M      11     57.5       85.0       000

Peter_C
Rhodochrosite | Level 12

as usual with Howard's stuff, it's a really neat idea!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 10473 views
  • 1 like
  • 5 in conversation