DATA Step, Macro, Functions and more

Handling Missing values

Accepted Solution Solved
Reply
Contributor
Posts: 21
Accepted Solution

Handling Missing values

Hi all,

 

I have data with missing values. Is there anyway I can convert missing values ( . or " ") into '-9' regardless of the type of variable.

 

data have;
input id name$ gen$ PROCHI DATE TC HDL LDL Ttg;
datalines;
1   ram      male      1   1.1   8.0   3.0   1.45    1
2   mohan male      1   1.1   8.5   2.9   1.24     .
3   shyam   .           2   2.1   6.8   3.1    2.23    2 
4   radha   female   2   2.1   7.2   3.8    3.80    .
5   gita      female   2   2.1    .      2.7    3.45    3
6   sita       .            2   3.1    8.9  6.8    4.56    2
7   ranu     female  3    2.1   7.5   5.9    1.83   5
8   tina      female   3   2.1   3.6   2.8    1.23   8
9   shan    male      4   8.1   2.8   1.1    1.67   9 
;
run;

 

data I want like this 

 

1   ram      male      1   1.1   8.0   3.0   1.45    1
2   mohan male      1   1.1   8.5   2.9   1.24     -9
3   shyam   -9           2   2.1   6.8   3.1    2.23    2 
4   radha   female   2   2.1   7.2   3.8    3.80    -9
5   gita      female   2   2.1    -9      2.7    3.45    3
6   sita       -9           2   3.1    8.9  6.8    4.56    2
7   ranu     female  3    2.1   7.5   5.9    1.83   5
8   tina      female   3   2.1   3.6   2.8    1.23   8
9   shan    male      4   8.1   2.8   1.1    1.67   9 

 

I have tried a below code- 

data have;

set want ;

if name = " " then name = "-9";

if TC = . then TC = -9 ;

if Ttg = . then Ttg = -9;

run;

 

But if we have 100 variables, it's not possible to write all the variables in one command. Is there any other way?

 

Thanks in advance

 

 


Accepted Solutions
Solution
2 weeks ago
PROC Star
Posts: 1,547

Re: Handling Missing values

Posted in reply to mehul4frnds

@KurtBremser 's great intuition of variable with length 1 byte is something you need a new assignment aka copy the values to the new variable assigned with a higher length and the rest is a breeze

data have;
input id name$ gen$ PROCHI DATE TC HDL LDL Ttg;
datalines;
1   ram      male      1   1.1   8.0   3.0   1.45    1
2   mohan male      1   1.1   8.5   2.9   1.24     .
3   shyam   .           2   2.1   6.8   3.1    2.23    2 
4   radha   female   2   2.1   7.2   3.8    3.80    .
5   gita      female   2   2.1    .      2.7    3.45    3
6   sita       .            2   3.1    8.9  6.8    4.56    2
7   ranu     female  3    2.1   7.5   5.9    1.83   5
8   tina      female   3   2.1   3.6   2.8    1.23   8
9   shan    male      4   8.1   2.8   1.1    1.67   9 
;
run;


data want;
set have;
array c(*) _character_ ;
array num _numeric_;
do _n_=1 to dim(c);
if missing(c(_n_)) then c(_n_)=left(-9);
end;
do _n_=1 to dim(num);
if missing(num(_n_)) then num(_n_)=-9;
end;
run;

View solution in original post


All Replies
Respected Advisor
Posts: 2,796

Re: Handling Missing values

Posted in reply to mehul4frnds

I think ARRAYs would work.

 

But in general, the easiest way to handle missing values is to NOT convert them to –9. SAS knows what to do with missing values, it does not know what to do with –9.

--
Paige Miller
Contributor
Posts: 21

Re: Handling Missing values

Posted in reply to PaigeMiller

Thanks for the reply.

Actually, I have to do a genetic analysis where protocol suggests that missing values should be converted into -9

Super User
Super User
Posts: 9,386

Re: Handling Missing values

Posted in reply to mehul4frnds

I quite agree with @PaigeMiller, it makes no sense to convert missings to -9, let alone changing character string to -9.  I mean, really no sense at all for characters.  For numerics, its possible that the stato has planned that, however I would question it and get specific confirmation.  Now if it really is just numbers, then maybe:

options missing=-9;

Will work for you, if not:

data want;
  set have;
  array n{*} _numeric_;
  do i=1 to dim(n);
    if n=. then n=-9;
  end;
run;

As for character, its not worth providing code, is a more complex process for something which doesn't make any sense.

Super User
Posts: 9,867

Re: Handling Missing values

Posted in reply to mehul4frnds

And what will you do with an empty character variable of length 1?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 21

Re: Handling Missing values

Posted in reply to KurtBremser

Thanks for the reply.

 

Yes, first I have to change the format in to 2 characters then only i can add missing value data

Super User
Posts: 9,867

Re: Handling Missing values

Posted in reply to mehul4frnds

@mehul4frnds wrote:

Thanks for the reply.

 

Yes, first I have to change the format in to 2 characters then only i can add missing value data


Changing the format won't do it. You need to create a new variable to replace the old one.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 21

Re: Handling Missing values

Posted in reply to KurtBremser

Thanks. I agree with you that a new variable needs to be created. 

 

Kindly suggest about overall conversion 

Super User
Posts: 9,867

Re: Handling Missing values

Posted in reply to mehul4frnds

Retrieve variable attributes from dictionary.columns or sashelp.vcolumn, and create a data step from that with call execute.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Solution
2 weeks ago
PROC Star
Posts: 1,547

Re: Handling Missing values

Posted in reply to mehul4frnds

@KurtBremser 's great intuition of variable with length 1 byte is something you need a new assignment aka copy the values to the new variable assigned with a higher length and the rest is a breeze

data have;
input id name$ gen$ PROCHI DATE TC HDL LDL Ttg;
datalines;
1   ram      male      1   1.1   8.0   3.0   1.45    1
2   mohan male      1   1.1   8.5   2.9   1.24     .
3   shyam   .           2   2.1   6.8   3.1    2.23    2 
4   radha   female   2   2.1   7.2   3.8    3.80    .
5   gita      female   2   2.1    .      2.7    3.45    3
6   sita       .            2   3.1    8.9  6.8    4.56    2
7   ranu     female  3    2.1   7.5   5.9    1.83   5
8   tina      female   3   2.1   3.6   2.8    1.23   8
9   shan    male      4   8.1   2.8   1.1    1.67   9 
;
run;


data want;
set have;
array c(*) _character_ ;
array num _numeric_;
do _n_=1 to dim(c);
if missing(c(_n_)) then c(_n_)=left(-9);
end;
do _n_=1 to dim(num);
if missing(num(_n_)) then num(_n_)=-9;
end;
run;
Contributor
Posts: 21

Re: Handling Missing values

Posted in reply to novinosrin

Thank you @novinosrin so much for the reply.

 

I used @KurtBremser 's suggestion and your codes work very well.

 

Thanks again

 

PROC Star
Posts: 1,547

Re: Handling Missing values

Posted in reply to mehul4frnds

@mehul4frnds You are welcome. I am glad!

Super User
Posts: 13,283

Re: Handling Missing values

Posted in reply to mehul4frnds

Another way is to read the data as desired in the first case:

proc format library=work;
invalue $nomiss
' ','.'='-9'
other=_same_;
invalue nomiss
' ','.'=-9
other=_same_;
run;

data have;
length name $ 10 gen $ 6;

informat id nomiss. name $nomiss. gen $nomiss. PROCHI DATE TC HDL LDL Ttg nomiss.;
input id name$ gen$ PROCHI DATE TC HDL LDL Ttg;
datalines;
1   ram   male    1   1.1   8.0   3.0   1.45   1
2   mohan male    1   1.1   8.5   2.9   1.24   .
3   shyam .       2   2.1   6.8   3.1   2.23   2 
4   radha female  2   2.1   7.2   3.8   3.80   .
5   gita  female  2   2.1   .     2.7   3.45   3
6   sita  .       2   3.1   8.9   6.8   4.56   2
7   ranu  female  3   2.1   7.5   5.9   1.83   5
8   tina  female  3   2.1   3.6   2.8   1.23   8
9   shan  male    4   8.1   2.8   1.1   1.67   9 
;
run;

This requires a little more from you to set the lengths of the string variables with either a Length or Attrib statement to prevent the character informat from setting the length to 2. I did not attempt to set a default length for the $nomiss informat as I have no idea what range of values you might be attempting to read.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 135 views
  • 8 likes
  • 6 in conversation