BookmarkSubscribeRSS Feed
brookedanel
Fluorite | Level 6

I'm trying to do a secondary analysis on a very large dataset that has a lot of missing values. I'd like to see if there is any pattern within the missing values. I'd like to create a new dummy variable for missing so 0=missing and 1=not missing. I'm not sure how to go about this. I've created a smaller dataset that narrow down my predictor and outcome variables of interest. All of the variables are numeric/continuous and each variable has missing values. Any suggestions are greatly appreciated!

5 REPLIES 5
novinosrin
Tourmaline | Level 20

do you want as simple as this?

 

data have;
do i=1 to 100;
num=int(ranuni(76567)*100);
if mod(i,5)=0 then call missing(num);
output;
end;
run;

data want;
set have;
dummy_num=num ne .;
run;
brookedanel
Fluorite | Level 6

Thank you so much! Can you just explain what this step does

 

num=int(ranuni(76567)*100);
if mod(i,5)=0 then call missing(num);

 

novinosrin
Tourmaline | Level 20

ignore the data have step. I wanted to create some sample data with random numbers as you didn't provide a sample

 

data want is what you need

 

 

For your interest,

 

Using ranuni function I created 100 random numbers between 0 and 100. For every 5th or in other words for every multiple of 5th value, i assigned them as missing. This is what data have does,

 

Data want is a simple boolean expression i.e if num is missing then true else false in the form of 1s and 0s

FreelanceReinh
Jade | Level 19

Hi @brookedanel,

 

If all you need are numbers and percentages of missing values for each variable or similar results for combinations of variables, you don't need dummy variables. Just use a format.

 

Example:

proc format;
value miss
._-.z = 'missing'
other = 'non-missing';

value $miss
' '   = 'missing'
other = 'non-missing';
run;

proc freq data=yourdata;
format _numeric_ miss. _character_ $miss.;
tables _all_ / missing; /* or replace _all_ with another variable list */
run;

 

Edit: The output (here for SASHELP.HEART) will look something like this:

The FREQ Procedure

                                        Cumulative    Cumulative
Status         Frequency     Percent     Frequency      Percent
----------------------------------------------------------------
non-missing        5209      100.00          5209       100.00


                         Cause of Death

                                        Cumulative    Cumulative
DeathCause     Frequency     Percent     Frequency      Percent
----------------------------------------------------------------
missing            3218       61.78          3218        61.78
non-missing        1991       38.22          5209       100.00


                        Age CHD Diagnosed

                                        Cumulative    Cumulative
 AgeCHDdiag    Frequency     Percent     Frequency      Percent
----------------------------------------------------------------
missing            3760       72.18          3760        72.18
non-missing        1449       27.82          5209       100.00

... portions of the output are not shown ...

 

 Edit 2: Here is an example using a combination of variables:

proc freq data=sashelp.heart;
format _numeric_ miss. _character_ $miss.;
tables status*deathcause*agechddiag / missing list;
run;

Output:

                                                                      Cumulative    Cumulative
Status         DeathCause      AgeCHDdiag    Frequency     Percent     Frequency      Percent
----------------------------------------------------------------------------------------------
non-missing    missing        missing            2663       51.12          2663        51.12
non-missing    missing        non-missing         555       10.65          3218        61.78
non-missing    non-missing    missing            1097       21.06          4315        82.84
non-missing    non-missing    non-missing         894       17.16          5209       100.00
brookedanel
Fluorite | Level 6

Thank you so much @FreelanceReinh. I'm going to give this a shot. It's similar to the code I was using but I see now where I missed something. I appreciate the quick reply. 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2993 views
  • 2 likes
  • 3 in conversation