DATA Step, Macro, Functions and more

Replacing missing values with do loop

Accepted Solution Solved
Reply
Contributor
Posts: 33
Accepted Solution

Replacing missing values with do loop

Hey guys. I'm looking to replace missing values in a table with zero. I have written the attached code. What confuses me is that the first part of the program works, where it replaces missing msd11 values. After that, however, nothing is replaced. For the life of me I can't figure out why. I've used different indexes (j, k, l, etc). I've tried splitting them all up into different data steps. Nothing works for me. The last thing I checked was to indeed make sure that these are the variable names, and they are. As listed, the variables I'm trying to change are:

msd1101 through msd1112 (this is the one that works)

qsd111 through qsd114

ssd20111 through ssd20112

a_msd1101 through a_msd1112

a_qsd111 through a_qsd114

a_ssd20111 through a_ssd20112

Edit: I have written a simpler code which replaces all missing values with zero (per SAS documentation), but I just want to know why this one isn't working.

Attachment

Accepted Solutions
Solution
‎06-27-2012 10:38 AM
Super User
Super User
Posts: 7,060

Re: Replacing missing values with do loop

Your array definitions are confused.  You are defining multiple arrays that refer to the same variables.  _NUMERIC_ will reference all numeric variables defined so far in the data vector.  In this case all of the numeric variables in the input dataset for the first array.  The later ones will also include the variable I that you introduced in the first DO loop.

You are also not looping over the full array.  Use the DIM() function to dynamically determine how many variables are in an array.

From your problem description I would code the array definitions and loops this way.

array msd11 msd1101 - msd1112;

do i = 01 to dim(msd11);

  if msd11(i) = . then msd11(i) = 0;

end;

drop i;


You only need one "DROP I;" statement, but the extra ones do not cause any trouble.  A number of people use _N_ as the loop variable because SAS will have already defined it and will always drop it.


You can also just use the DO OVER loop syntax instead as you are really not using the index to encode any meaningful information.


array msd11 msd1101 - msd1112;

do over msd11;

  if msd11 = . then msd11 = 0;

end;


View solution in original post


All Replies
Super User
Posts: 11,343

Re: Replacing missing values with do loop

You only need one DROP statement.

I think your problem is a misunderstanding of what using _numeric_ in the array statement does. It places all numeric values in the array. In EVERY array. They will have the same oreder. since the indices of your arrays other than MSD11 are all less than or equal to the number of items treated in the first array (12) your are just checking the same 12 or fewer variables over and over.

If you want to get ALL missing numeric values with missing set to 0 then try:

array msd11

  • _numeric_;
  • do i = 1 to dim(msd11);

         if msd11 = . then msd11 = 0;

    end;

    otherewise you need to specifically list each set of variables you were thinking of with each of your array declarations.

    Also, your example code was short enough you should include it the post.

    Solution
    ‎06-27-2012 10:38 AM
    Super User
    Super User
    Posts: 7,060

    Re: Replacing missing values with do loop

    Your array definitions are confused.  You are defining multiple arrays that refer to the same variables.  _NUMERIC_ will reference all numeric variables defined so far in the data vector.  In this case all of the numeric variables in the input dataset for the first array.  The later ones will also include the variable I that you introduced in the first DO loop.

    You are also not looping over the full array.  Use the DIM() function to dynamically determine how many variables are in an array.

    From your problem description I would code the array definitions and loops this way.

    array msd11 msd1101 - msd1112;

    do i = 01 to dim(msd11);

      if msd11(i) = . then msd11(i) = 0;

    end;

    drop i;


    You only need one "DROP I;" statement, but the extra ones do not cause any trouble.  A number of people use _N_ as the loop variable because SAS will have already defined it and will always drop it.


    You can also just use the DO OVER loop syntax instead as you are really not using the index to encode any meaningful information.


    array msd11 msd1101 - msd1112;

    do over msd11;

      if msd11 = . then msd11 = 0;

    end;


    Contributor
    Posts: 33

    Re: Replacing missing values with do loop

    Thanks guys!

    Respected Advisor
    Posts: 3,156

    Re: Replacing missing values with do loop

    Or you can just:

    proc stdize data=have reponly method=sum missing=0 out=want;

       var msd1101 - msd1112;

       run;

    Haikuo

    🔒 This topic is solved and locked.

    Need further help from the community? Please ask a new question.

    Discussion stats
    • 4 replies
    • 1952 views
    • 5 likes
    • 4 in conversation