DATA Step, Macro, Functions and more

Treat Missing values as largest possible values

Accepted Solution Solved
Reply
Regular Contributor
Posts: 167
Accepted Solution

Treat Missing values as largest possible values

Hi,

 

 

I am trying to join two datasets A and B. While joining the two datasets I am applying the follwoing conditions.

 

if (A1=B1) AND (A2=B2) AND (ABS(A3-B3) LE 0.05 OR ABS(A4-B4) LE 0.05)

 

in the above condition, if  either A3,B3,A4,B4 becomes equal to missing values. the result of the condition ((ABS(A3-B3) LE 0.05 OR ABS(A4-B4) LE 0.05)) always becomes . and always less than 0.05

 

Is there any way to treat missing values as largest possible numbers.

 

 

Thanks in advance,

Sheeba Swaminathan


Accepted Solutions
Solution
‎12-23-2016 01:20 PM
Super User
Posts: 11,343

Re: Treat Missing values as largest possible values

Do A3 and B3 or A4 and B4 ever both have missing values? If so you may have to add some additional levels of comparison such as

 

( ( ABS(A3-B3) LE 0.05) and not ( missing(A3) and Missing(B3) ) )

 

but more details such as actual values and the desired results may be needed. Some time it may be better to subset some of the data in easier chunks and then recombine.

View solution in original post


All Replies
Contributor
Posts: 43

Re: Treat Missing values as largest possible values

[ Edited ]

if you use the sum function instead of the plain math, the missing value is handled differently

 

for example 

 

data temp;

  a = .;

  b= 5;

  c=sum(a,-b);  /* c evaluates to -5 */

  d = a-b;  /* d evaluates to missing */

run;  

 

Try:

if (A1=B1) AND (A2=B2) AND (ABS(SUM(A3,-B3)) LE 0.05 OR ABS(SUM(A4,-B4)) LE 0.05)

Regular Contributor
Posts: 167

Re: Treat Missing values as largest possible values

Hi Tmiles,

 

Thanks a lot for the reply.

 

I am worried about the situation where both are missing values. In this case sum function again will evaluate to 0 and again it will become less than 0.05

 

I modified the condition to the following  to handle the missing values by adding zero to each but again if both A4,B4 turns out to missing . this will result in zero and will become less than 0.05.

 

if (A1=B1) AND (A2=B2) AND (abs(sum(A3,0) - sum(B3,0)) le 0.05) or  (abs(sum(A4,0) - sum(B4,0)) le .05)

 

Regards,

sheeba

Contributor
Posts: 43

Re: Treat Missing values as largest possible values

[ Edited ]

You could always check for missing values prior to the subsetting IF and set to a default value. THis will only help if only 1 side of the equation is missing.

 

Is it safe to assume if both sides of the equation are missing you want to handle the condition differently?  If so perhaps If Then Else logic would get you thru it.

 

something like:

 

if sum(a3,b3,a4,b4) > 0 then do;

  if (A1=B1) AND (A2=B2) AND (ABS(SUM(A3,-B3)) LE 0.05 OR ABS(SUM(A4,-B4)) LE 0.05) then ??;

end;

else do;

  ???

end;

 

 

 

 

Regular Contributor
Posts: 167

Re: Treat Missing values as largest possible values

Hi Tmiles,

 

Thanks for the quick reply.

 

Yes. I wouldnt want the match if both are missing values. Also I am populating this conditions dynamically .

 

I will try this out.

 

Regards,

Sheeba

Solution
‎12-23-2016 01:20 PM
Super User
Posts: 11,343

Re: Treat Missing values as largest possible values

Do A3 and B3 or A4 and B4 ever both have missing values? If so you may have to add some additional levels of comparison such as

 

( ( ABS(A3-B3) LE 0.05) and not ( missing(A3) and Missing(B3) ) )

 

but more details such as actual values and the desired results may be needed. Some time it may be better to subset some of the data in easier chunks and then recombine.

Regular Contributor
Posts: 167

Re: Treat Missing values as largest possible values

Hi Ballardw,

 

Thanks a lot for the reply.

 

right now the situation of getting missing values in both the columns doesnt exist but i would like to make modifications to the code to handle such situations as well. tnx a lot for the code.

 

Also i will consider subsetting the data to filter out this conditions.

 

Thanks again,

 

Regards,

Sheeba

Super User
Super User
Posts: 7,076

Re: Treat Missing values as largest possible values

[ Edited ]

If you are concerned that when you code a condition like

(A <= 0.5)

That missing values of A cause the condition to be true then just change your condition to account for missing values.

(.Z < A <= 0.5)

Or

(A <= 0.5 and not missing(A))

 

In your specific example you could just remove the ABS() function and code the positive and negative ranges.

-0.5 <= (A3-B3) <= 0.05 
Regular Contributor
Posts: 167

Re: Treat Missing values as largest possible values

Hi Tom,

 

Tnx a lot for the suggestions. This is really helpful.

 

Regards,

sheeba .

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 298 views
  • 3 likes
  • 4 in conversation