We have this table, but we need to replace missing values with "-".
data have;
input Question sequence sequence2 var1 var4 var8;
datalines;
A 1 0.1 . . .
Yes 1 1 VALUE . VALUE
No 1 2 VALUE . VALUE
Ukn 1 3 VALUE . VALUE
B 2 0.1 . . .
Yes 2 1 . VALUE .
No 2 2 . VALUE .
Ukn 2 3 . VALUE .
C 3 0.1 . . .
Yes 3 1 . . VALUE
No 3 2 . . VALUE
Ukn 3 3 . . VALUE
;
run,
But missing values in rows for when sequence2 = 0.1 should remain missing. See the desired output:
Question | sequence | sequence2 | var1 | var4 | var8 |
A | 1 | 0.1 | |||
Yes | 1 | 1 | VALUE | -- | VALUE |
No | 1 | 2 | VALUE | -- | VALUE |
Ukn | 1 | 3 | VALUE | -- | VALUE |
B | 2 | 0.1 | |||
Yes | 2 | 1 | -- | VALUE | -- |
No | 2 | 2 | -- | VALUE | -- |
Ukn | 2 | 3 | -- | VALUE | -- |
C | 3 | 0.1 | |||
Yes | 3 | 1 | -- | -- | VALUE |
No | 3 | 2 | -- | -- | VALUE |
Ukn | 3 | 3 | -- | -- | VALUE |
That you've got a period stored in a character variable indicates some numeric to character conversion of your data somewhere upstream which doesn't address that missings in characters should be represented by a blank.
Is the change you're after just for display/a report or do you really want to change the values permanently?
For a report look into Proc Report and compute blocks that would allow you to change the display of variables based on another variable.
For actually changing the internal values in your table code along the line of below should do.
data want;
set have;
array vars(*) var1 var4 var8;
do _i=1 to dim(vars);
if vars(_i)='.' then
do;
if sequence2=0.1 then vars(_i)='--';
else vars(_i)=' ';
end;
end;
drop _i;
run;
are your variables actually numeric or character?
Your data step attempts to read variables Var1 Var4 and Var8 as numeric. So all those text "VALUE" also become missing because "VALUE" is not acceptable a numeric value.
SAS actually has 28 "levels" of missing. The . is just one of them. Special missing can also be indicated as . plus a letter or the underscore character. .A .B ._
So you could assign a custom format to display special missing as a dash, double dash, word or what have you.
This modifies your data step to 1) read question as a character varaible 2) changed "value" to 9 so it doesn't throw errors when read and replaces the . with .A in places you indicated.
The custom format then will display the values as desired.
data have; input Question $ sequence sequence2 var1 var4 var8; datalines; A 1 0.1 . . . Yes 1 1 9 .a 9 No 1 2 9 .a 9 Ukn 1 3 9 .a 9 B 2 0.1 . . . Yes 2 1 .a 9 .a No 2 2 .a 9 .a Ukn 2 3 .a 9 .a C 3 0.1 . . . Yes 3 1 .a .a 9 No 3 2 .a .a 9 Ukn 3 3 .a .a 9 ; run, proc format; value mycustommiss . = ' ' .A= '--' other=[best5.]; run; proc print data=have; format var1 var4 var8 mycustommiss.; run;
If you look at the data set in a the table viewer you will see A, not .A. That is okay and expected.
You can assign a special missing like .A (case doesn't matter) .Z or whatever anyplace you would assign . as a value.
Note that the FORMAT will need to be available when using these values to display as desired.
Special missing will not be included in statistics (n, sum, mean std deviation etc) unless you use/request the MISSING option.
I have used these to document reasons data is missing such as in a survey where a respondent might refuse a question (.R) or answer "Don't Know" (.D) when I don't want those counted in the results.
Hi, thank you for responding. var1, var4, var8 etc are character variables. Data=want is from an output, so I cannot change that (it comes from a macro and I rather don't change the macro). I am not sure I understand how to use format to change to . and .a. Is there any other way to program this?
If you do a PROC FREQ of your character variables VAR1, VAR4, and VAR8, remember that the blanks will be treated as missing values by PROC FREQ, but the dashes will not be so treated. Is that what you want?
But if you really have two types of missingness, indicated by the values " " (blank) and "-", and you want them both automatically treated as missing by various PROC's, then you might be well advised to convert those variables to numeric variables, which as @ballardw has pointed out, supports up to 28 distinct missing values. You can always assign formats to those variables, so that the displayed values are "Value", " ", and "-", even though the internal values are numeric.
That you've got a period stored in a character variable indicates some numeric to character conversion of your data somewhere upstream which doesn't address that missings in characters should be represented by a blank.
Is the change you're after just for display/a report or do you really want to change the values permanently?
For a report look into Proc Report and compute blocks that would allow you to change the display of variables based on another variable.
For actually changing the internal values in your table code along the line of below should do.
data want;
set have;
array vars(*) var1 var4 var8;
do _i=1 to dim(vars);
if vars(_i)='.' then
do;
if sequence2=0.1 then vars(_i)='--';
else vars(_i)=' ';
end;
end;
drop _i;
run;
Thank you for responding and explaining. The change we are after is for display/report. I am not sure I follow how to code this in proc report.
Thanks again for your help.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.