I have a table with 7 columns, 5 of which have missing values.
Taking an example from the code below in table HAVE; row (PAT_ID) 4 has missing values for AGE, INCOME_BLOCK, UNEMPLOY_BLOCK, and HOMEOWNER_BLOCK – since at least one of these four columns (INCOME_BLOCK, EDU_BLOCK, UNEMPLOY_BLOCK, and HOMEOWNER_BLOCK) have a missing value, I want all values for these four columns to become missing.
data HAVE;
input PAT_ID : best2. SEX :$6. AGE : best2. INCOME_BLOCK : best6. EDU_BLOCK : best4. UNEMPLOY_BLOCK : best4. HOMEOWNER_BLOCK : best4. ;
datalines;
1 Female 22 45810 93.82 13.07 9.32
2 Male 37 35264 54.94 27.81 4.76
3 Male 27 35264 54.94 27.81 .
4 Male . . 18.62 . .
5 Female 27 42346 18.62 1.87 1.29
6 Male 28 42346 41.11 4.76 10.29
7 Female 23 250001 41.11 3.02 1.29
8 Female 21 115290 93.82 27.81 1.87
9 Female 43 215493 21.65 10.29 3.60
10 Male 35 215493 . 27.81 3.02
11 Male . 215493 15.61 27.81 3.02
12 Male 21 77521 18.62 7.17 9.32
13 Male 21 215493 54.94 5.95 27.81
14 Male 28 115290 34.35 . .
15 Male 34 115290 34.35 1.29 9.32
16 Male . 30872 34.35 10.29 7.17
17 Female 36 105039 15.61 4.76 13.07
18 Female 23 . . . .
19 Female 34 38964 9.55 4.76 2.45
20 Female 26 38964 54.94 10.29 10.29
;
data WANT;
input PAT_ID : best2. SEX :$6. AGE : best2. INCOME_BLOCK : best6. EDU_BLOCK : best4. UNEMPLOY_BLOCK : best4. HOMEOWNER_BLOCK : best4. ;
datalines;
1 Female 22 45810 93.82 13.07 9.32
2 Male 37 35264 54.94 27.81 4.76
3 Male 27 . . . .
4 Male . . . . .
5 Female 27 42346 18.62 1.87 1.29
6 Male 28 42346 41.11 4.76 10.29
7 Female 23 250001 41.11 3.02 1.29
8 Female 21 115290 93.82 27.81 1.87
9 Female 43 215493 21.65 10.29 3.60
10 Male 35 . . . .
11 Male . 215493 15.61 27.81 3.02
12 Male 21 77521 18.62 7.17 9.32
13 Male 21 215493 54.94 5.95 27.81
14 Male 28 . . . .
15 Male 34 115290 34.35 1.29 9.32
16 Male . 30872 34.35 10.29 7.17
17 Female 36 105039 15.61 4.76 13.07
18 Female 23 . . . .
19 Female 34 38964 9.55 4.76 2.45
20 Female 26 38964 54.94 10.29 10.29
;
If the variables are numeric the easiest is to use the NMISS function to count how many variables have missing values. The use the Call missing function.
if nmiss(AGE, INCOME_BLOCK, UNEMPLOY_BLOCK, HOMEOWNER_BLOCK) > 0 then
call missing(AGE, INCOME_BLOCK, UNEMPLOY_BLOCK, HOMEOWNER_BLOCK);
Call missing is one of the few routines that will work with mix of numeric and character variables and sets ALL the variables to missing.
If you have a mix of numeric and character variables you can either do a bunch of IF OR testing if any variable is missing Or use some code to create a value to examine for a known behavior such as this:
data HAVE; input PAT_ID : best2. SEX :$6. AGE : best2. INCOME_BLOCK : best6. EDU_BLOCK : best4. UNEMPLOY_BLOCK : best4. HOMEOWNER_BLOCK : best4. ; datalines; 1 Female 22 45810 93.82 13.07 9.32 2 Male 37 35264 54.94 27.81 4.76 3 Male 27 35264 54.94 27.81 . 4 Male . . 18.62 . . 5 Female 27 42346 18.62 1.87 1.29 6 . 28 42346 41.11 4.76 10.29 7 Female 23 250001 41.11 3.02 1.29 8 Female 21 115290 93.82 27.81 1.87 9 Female 43 215493 21.65 10.29 3.60 10 . 35 215493 . 27.81 3.02 11 Male . 215493 15.61 27.81 3.02 12 Male 21 77521 18.62 7.17 9.32 13 Male 21 215493 54.94 5.95 27.81 14 Male 28 115290 34.35 . . 15 Male 34 115290 34.35 1.29 9.32 16 Male . 30872 34.35 10.29 7.17 17 . 36 105039 15.61 4.76 13.07 18 Female 23 . . . . 19 Female 34 38964 9.55 4.76 2.45 20 Female 26 38964 54.94 10.29 10.29 ; options missing=' '; data want; set have; If countw ( catx('|',sex, age,INCOME_BLOCK , EDU_BLOCK,UNEMPLOY_BLOCK, HOMEOWNER_BLOCK), '|' ) < 6 then call missing(sex, age,INCOME_BLOCK , EDU_BLOCK,UNEMPLOY_BLOCK, HOMEOWNER_BLOCK); run; /* reset the option*/ options missing='.';
I made a couple of the Sex values missing.
The Catx function will do a conversion to character for the numeric values to make a character value placing a specified character, in this case the | between values. By setting the Option missing to a space the missing values are skipped (CATX removes trailing spaces, if the only value is a space, then it gets eliminated).
The Countw function the counts how many values are delimited by the | character. If the number is less than the number of variables (exercise for the interested reader how to eliminate the magic number 6 ) then at least one of the variables was missing and again Call Missing sets them to missing.
And for those who don't like writing the same variable-names multiple times, an array comes to the rescue:
data want;
set have;
array block income_block edu_block unemploy_block homeowner_block;
if nmiss(of block[*]) then call missing(of block[*]);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.