Hi all,
I have data with missing values. Is there anyway I can convert missing values ( . or " ") into '-9' regardless of the type of variable.
data have;
input id name$ gen$ PROCHI DATE TC HDL LDL Ttg;
datalines;
1 ram male 1 1.1 8.0 3.0 1.45 1
2 mohan male 1 1.1 8.5 2.9 1.24 .
3 shyam . 2 2.1 6.8 3.1 2.23 2
4 radha female 2 2.1 7.2 3.8 3.80 .
5 gita female 2 2.1 . 2.7 3.45 3
6 sita . 2 3.1 8.9 6.8 4.56 2
7 ranu female 3 2.1 7.5 5.9 1.83 5
8 tina female 3 2.1 3.6 2.8 1.23 8
9 shan male 4 8.1 2.8 1.1 1.67 9
;
run;
data I want like this
1 ram male 1 1.1 8.0 3.0 1.45 1
2 mohan male 1 1.1 8.5 2.9 1.24 -9
3 shyam -9 2 2.1 6.8 3.1 2.23 2
4 radha female 2 2.1 7.2 3.8 3.80 -9
5 gita female 2 2.1 -9 2.7 3.45 3
6 sita -9 2 3.1 8.9 6.8 4.56 2
7 ranu female 3 2.1 7.5 5.9 1.83 5
8 tina female 3 2.1 3.6 2.8 1.23 8
9 shan male 4 8.1 2.8 1.1 1.67 9
I have tried a below code-
data have;
set want ;
if name = " " then name = "-9";
if TC = . then TC = -9 ;
if Ttg = . then Ttg = -9;
run;
But if we have 100 variables, it's not possible to write all the variables in one command. Is there any other way?
Thanks in advance
@Kurt_Bremser 's great intuition of variable with length 1 byte is something you need a new assignment aka copy the values to the new variable assigned with a higher length and the rest is a breeze
data have;
input id name$ gen$ PROCHI DATE TC HDL LDL Ttg;
datalines;
1 ram male 1 1.1 8.0 3.0 1.45 1
2 mohan male 1 1.1 8.5 2.9 1.24 .
3 shyam . 2 2.1 6.8 3.1 2.23 2
4 radha female 2 2.1 7.2 3.8 3.80 .
5 gita female 2 2.1 . 2.7 3.45 3
6 sita . 2 3.1 8.9 6.8 4.56 2
7 ranu female 3 2.1 7.5 5.9 1.83 5
8 tina female 3 2.1 3.6 2.8 1.23 8
9 shan male 4 8.1 2.8 1.1 1.67 9
;
run;
data want;
set have;
array c(*) _character_ ;
array num _numeric_;
do _n_=1 to dim(c);
if missing(c(_n_)) then c(_n_)=left(-9);
end;
do _n_=1 to dim(num);
if missing(num(_n_)) then num(_n_)=-9;
end;
run;
I think ARRAYs would work.
But in general, the easiest way to handle missing values is to NOT convert them to –9. SAS knows what to do with missing values, it does not know what to do with –9.
Thanks for the reply.
Actually, I have to do a genetic analysis where protocol suggests that missing values should be converted into -9
I quite agree with @PaigeMiller, it makes no sense to convert missings to -9, let alone changing character string to -9. I mean, really no sense at all for characters. For numerics, its possible that the stato has planned that, however I would question it and get specific confirmation. Now if it really is just numbers, then maybe:
options missing=-9;
Will work for you, if not:
data want; set have; array n{*} _numeric_; do i=1 to dim(n); if n=. then n=-9; end; run;
As for character, its not worth providing code, is a more complex process for something which doesn't make any sense.
And what will you do with an empty character variable of length 1?
Thanks for the reply.
Yes, first I have to change the format in to 2 characters then only i can add missing value data
@mehul4frnds wrote:
Thanks for the reply.
Yes, first I have to change the format in to 2 characters then only i can add missing value data
Changing the format won't do it. You need to create a new variable to replace the old one.
Thanks. I agree with you that a new variable needs to be created.
Kindly suggest about overall conversion
Retrieve variable attributes from dictionary.columns or sashelp.vcolumn, and create a data step from that with call execute.
@Kurt_Bremser 's great intuition of variable with length 1 byte is something you need a new assignment aka copy the values to the new variable assigned with a higher length and the rest is a breeze
data have;
input id name$ gen$ PROCHI DATE TC HDL LDL Ttg;
datalines;
1 ram male 1 1.1 8.0 3.0 1.45 1
2 mohan male 1 1.1 8.5 2.9 1.24 .
3 shyam . 2 2.1 6.8 3.1 2.23 2
4 radha female 2 2.1 7.2 3.8 3.80 .
5 gita female 2 2.1 . 2.7 3.45 3
6 sita . 2 3.1 8.9 6.8 4.56 2
7 ranu female 3 2.1 7.5 5.9 1.83 5
8 tina female 3 2.1 3.6 2.8 1.23 8
9 shan male 4 8.1 2.8 1.1 1.67 9
;
run;
data want;
set have;
array c(*) _character_ ;
array num _numeric_;
do _n_=1 to dim(c);
if missing(c(_n_)) then c(_n_)=left(-9);
end;
do _n_=1 to dim(num);
if missing(num(_n_)) then num(_n_)=-9;
end;
run;
Thank you @novinosrin so much for the reply.
I used @Kurt_Bremser 's suggestion and your codes work very well.
Thanks again
@mehul4frnds You are welcome. I am glad!
Another way is to read the data as desired in the first case:
proc format library=work; invalue $nomiss ' ','.'='-9' other=_same_; invalue nomiss ' ','.'=-9 other=_same_; run; data have; length name $ 10 gen $ 6; informat id nomiss. name $nomiss. gen $nomiss. PROCHI DATE TC HDL LDL Ttg nomiss.; input id name$ gen$ PROCHI DATE TC HDL LDL Ttg; datalines; 1 ram male 1 1.1 8.0 3.0 1.45 1 2 mohan male 1 1.1 8.5 2.9 1.24 . 3 shyam . 2 2.1 6.8 3.1 2.23 2 4 radha female 2 2.1 7.2 3.8 3.80 . 5 gita female 2 2.1 . 2.7 3.45 3 6 sita . 2 3.1 8.9 6.8 4.56 2 7 ranu female 3 2.1 7.5 5.9 1.83 5 8 tina female 3 2.1 3.6 2.8 1.23 8 9 shan male 4 8.1 2.8 1.1 1.67 9 ; run;
This requires a little more from you to set the lengths of the string variables with either a Length or Attrib statement to prevent the character informat from setting the length to 2. I did not attempt to set a default length for the $nomiss informat as I have no idea what range of values you might be attempting to read.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.