I have a table with 7 columns, 5 of which have missing values.
Taking an example from the code below in table HAVE; row (PAT_ID) 4 has missing values for AGE, INCOME_BLOCK, UNEMPLOY_BLOCK, and HOMEOWNER_BLOCK – since at least one of these four columns (INCOME_BLOCK, EDU_BLOCK, UNEMPLOY_BLOCK, and HOMEOWNER_BLOCK) have a missing value, I want all values for these four columns to become missing.
data HAVE;
input PAT_ID : best2. SEX :$6. AGE : best2. INCOME_BLOCK : best6. EDU_BLOCK : best4. UNEMPLOY_BLOCK : best4. HOMEOWNER_BLOCK : best4. ;
datalines;
1 Female 22 45810 93.82 13.07 9.32
2 Male 37 35264 54.94 27.81 4.76
3 Male 27 35264 54.94 27.81 .
4 Male . . 18.62 . .
5 Female 27 42346 18.62 1.87 1.29
6 Male 28 42346 41.11 4.76 10.29
7 Female 23 250001 41.11 3.02 1.29
8 Female 21 115290 93.82 27.81 1.87
9 Female 43 215493 21.65 10.29 3.60
10 Male 35 215493 . 27.81 3.02
11 Male . 215493 15.61 27.81 3.02
12 Male 21 77521 18.62 7.17 9.32
13 Male 21 215493 54.94 5.95 27.81
14 Male 28 115290 34.35 . .
15 Male 34 115290 34.35 1.29 9.32
16 Male . 30872 34.35 10.29 7.17
17 Female 36 105039 15.61 4.76 13.07
18 Female 23 . . . .
19 Female 34 38964 9.55 4.76 2.45
20 Female 26 38964 54.94 10.29 10.29
;
data WANT;
input PAT_ID : best2. SEX :$6. AGE : best2. INCOME_BLOCK : best6. EDU_BLOCK : best4. UNEMPLOY_BLOCK : best4. HOMEOWNER_BLOCK : best4. ;
datalines;
1 Female 22 45810 93.82 13.07 9.32
2 Male 37 35264 54.94 27.81 4.76
3 Male 27 . . . .
4 Male . . . . .
5 Female 27 42346 18.62 1.87 1.29
6 Male 28 42346 41.11 4.76 10.29
7 Female 23 250001 41.11 3.02 1.29
8 Female 21 115290 93.82 27.81 1.87
9 Female 43 215493 21.65 10.29 3.60
10 Male 35 . . . .
11 Male . 215493 15.61 27.81 3.02
12 Male 21 77521 18.62 7.17 9.32
13 Male 21 215493 54.94 5.95 27.81
14 Male 28 . . . .
15 Male 34 115290 34.35 1.29 9.32
16 Male . 30872 34.35 10.29 7.17
17 Female 36 105039 15.61 4.76 13.07
18 Female 23 . . . .
19 Female 34 38964 9.55 4.76 2.45
20 Female 26 38964 54.94 10.29 10.29
;
If the variables are numeric the easiest is to use the NMISS function to count how many variables have missing values. The use the Call missing function.
if nmiss(AGE, INCOME_BLOCK, UNEMPLOY_BLOCK, HOMEOWNER_BLOCK) > 0 then
call missing(AGE, INCOME_BLOCK, UNEMPLOY_BLOCK, HOMEOWNER_BLOCK);
Call missing is one of the few routines that will work with mix of numeric and character variables and sets ALL the variables to missing.
If you have a mix of numeric and character variables you can either do a bunch of IF OR testing if any variable is missing Or use some code to create a value to examine for a known behavior such as this:
data HAVE; input PAT_ID : best2. SEX :$6. AGE : best2. INCOME_BLOCK : best6. EDU_BLOCK : best4. UNEMPLOY_BLOCK : best4. HOMEOWNER_BLOCK : best4. ; datalines; 1 Female 22 45810 93.82 13.07 9.32 2 Male 37 35264 54.94 27.81 4.76 3 Male 27 35264 54.94 27.81 . 4 Male . . 18.62 . . 5 Female 27 42346 18.62 1.87 1.29 6 . 28 42346 41.11 4.76 10.29 7 Female 23 250001 41.11 3.02 1.29 8 Female 21 115290 93.82 27.81 1.87 9 Female 43 215493 21.65 10.29 3.60 10 . 35 215493 . 27.81 3.02 11 Male . 215493 15.61 27.81 3.02 12 Male 21 77521 18.62 7.17 9.32 13 Male 21 215493 54.94 5.95 27.81 14 Male 28 115290 34.35 . . 15 Male 34 115290 34.35 1.29 9.32 16 Male . 30872 34.35 10.29 7.17 17 . 36 105039 15.61 4.76 13.07 18 Female 23 . . . . 19 Female 34 38964 9.55 4.76 2.45 20 Female 26 38964 54.94 10.29 10.29 ; options missing=' '; data want; set have; If countw ( catx('|',sex, age,INCOME_BLOCK , EDU_BLOCK,UNEMPLOY_BLOCK, HOMEOWNER_BLOCK), '|' ) < 6 then call missing(sex, age,INCOME_BLOCK , EDU_BLOCK,UNEMPLOY_BLOCK, HOMEOWNER_BLOCK); run; /* reset the option*/ options missing='.';
I made a couple of the Sex values missing.
The Catx function will do a conversion to character for the numeric values to make a character value placing a specified character, in this case the | between values. By setting the Option missing to a space the missing values are skipped (CATX removes trailing spaces, if the only value is a space, then it gets eliminated).
The Countw function the counts how many values are delimited by the | character. If the number is less than the number of variables (exercise for the interested reader how to eliminate the magic number 6 ) then at least one of the variables was missing and again Call Missing sets them to missing.
And for those who don't like writing the same variable-names multiple times, an array comes to the rescue:
data want;
set have;
array block income_block edu_block unemploy_block homeowner_block;
if nmiss(of block[*]) then call missing(of block[*]);
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.