## detect % of observations missing using BY

Solved
Super Contributor
Posts: 409

# detect % of observations missing using BY

Hi, Is there a way I can detect the % of the number of observations that have a valid (not missing or >0) value in a series of variables and using a By grouping.. Here's what I mean..

HAVE

Store   Product    Var1   Var2   Var3    Var4

A           A1           5       .         6          3

A           A1           7        6        0          9

A           A1           6        .         4          .

A           A2           .        5        3          3

A           A2           5        4         7         5

B           B1           3       .         0          8

B           B1          .        5         6          7

B           B2            .        5         8          9

Now if we look at the above data and see that for each Sore and Product, I want to see how many

. and 0's there are for only the Var1 to Var3 in total..  So what I'd like to see is:

Store   Product   %-missing

A           A1            33.33%   <- 3/9 - highlighted in red

A           A2            16.66%   <- 1/6 - highlighted in blue

B           B1             50%       <- 3/6 - highlighted in green

B           B2             33.33%   <- 1/3 - highlighted in purple

Thanks

Accepted Solutions
Solution
‎03-21-2012 01:36 PM
Contributor
Posts: 65

## Re: detect % of observations missing using BY

This is not a very elegant solution as I had to physically set 0's to missing and hardcode the number of variables (3), but if quick and dirty will do. . .

data have (drop = i);

array v (*) v1-v4;

input store \$ 1 product \$ 3-4 v1-v4;

do i = 1 to dim(v); if v(i) = 0 then v(i) = .; end;

datalines;

A A1 5 . 6 3

A A1 7 6 0 9

A A1 6 . 4 .

A A2 . 5 3 3

A A2 5 4 7 5

B B1 3 . 0 8

B B1 . 5 6 7

B B2 . 5 8 9

;;;;

run;

proc sql;

create table want as

select store,product,(nmiss(v1)+nmiss(v2)+nmiss(v3)) / (count(*)*3) as result

from have

group by store,product

;

quit;

Oh, and result is a proportion not a percent so you'd need to format it how you'd like to see it.

Karl

All Replies
Super User
Posts: 23,700

## detect % of observations missing using BY

It would take more than one step because you're looking at multiple variables and by groups.

Super User
Posts: 6,762

## detect % of observations missing using BY

Why do the calculations use Var1, Var2, and Var3, but ignore Var4?

It's not hard, I just want to know the rules first.

Super Contributor
Posts: 409

## detect % of observations missing using BY

Var4 is just a variable that doesn't have any data that's meaningful, or at least that needs to be considered in my analysis..

Super User
Posts: 6,762

## Re: detect % of observations missing using BY

OK, using only var1-var3 then:

data want;

set have;

by store product;

if first.product then do;

n_missing=0;

n_nonmissing=0;

end;

array keyvars {3} var1-var3;

do _n_=1 to 3;

if keyvars{_n_} in (0, .) then n_missing + 1;

else n_nonmissing + 1;

end;

if last.product;

pct_missing = 100 * n_missing / (n_missing + n_nonmissing);

keep store product pct_missing   /* possibly keep n_missing and n_nonmissing as well */;

run;

Good luck.

Solution
‎03-21-2012 01:36 PM
Contributor
Posts: 65

## Re: detect % of observations missing using BY

This is not a very elegant solution as I had to physically set 0's to missing and hardcode the number of variables (3), but if quick and dirty will do. . .

data have (drop = i);

array v (*) v1-v4;

input store \$ 1 product \$ 3-4 v1-v4;

do i = 1 to dim(v); if v(i) = 0 then v(i) = .; end;

datalines;

A A1 5 . 6 3

A A1 7 6 0 9

A A1 6 . 4 .

A A2 . 5 3 3

A A2 5 4 7 5

B B1 3 . 0 8

B B1 . 5 6 7

B B2 . 5 8 9

;;;;

run;

proc sql;

create table want as

select store,product,(nmiss(v1)+nmiss(v2)+nmiss(v3)) / (count(*)*3) as result

from have

group by store,product

;

quit;

Oh, and result is a proportion not a percent so you'd need to format it how you'd like to see it.

Karl

Super Contributor
Posts: 409

## Re: detect % of observations missing using BY

Thanks Karl.. great results..

Super Contributor
Posts: 409

## Re: detect % of observations missing using BY

Karl, I'm getting some weird answers when I'm trying only one variable (such as Price)...

select Store,Product,(nmiss(Price)) / (count(*)*1) as result

as I'd like to see how much of that field is being populated throughout the Stores/Products..

Contributor
Posts: 65

## Re: detect % of observations missing using BY

Hmm, I ran the code on your existing v1-v3 from the original post as

select store,product,nmiss(v1) / count(*) as result

from have

group by store,product;

and got the correct answers. (I removed the extraneous parens and the "*1", but it shouldn't make any difference.)

Did you remember to recode 0's as missing on your Price variable before running the SQL query? If not, maybe you could post some sample data that shows the problem and I'll take another look.

Karl

Super Contributor
Posts: 409

## Re: detect % of observations missing using BY

I ran the same code and got the correct results on the test data, but seem sshort on the main data.. I did change the 0's to . ..

I'll try again..

Super Contributor
Posts: 409

## Re: detect % of observations missing using BY

Just to make sure that I'm doing this correctly.. I have 13 varaibles that I need to convert 0's to missing.. so I used this:

array v (*) Price--NHIP;

do i = 1 to dim(v);if v(i) = 0 then v(i) = .;end;

and then;

select Store, Product, (nmiss(Price)) / Count(*)*1) as result

Does this look correct ?

Contributor
Posts: 65

## Re: detect % of observations missing using BY

Yes, it looks right (assuming the "group by" is in your actual query). As I said, you can drop the *1 if there's only one variable, but I tried it with and without just to convince myself.

If the results for Price don't "smell" right to you, you might try doing a proc freq on Price, before and after the recode to missing and see if there's something funky going on. Can happen if price is calculated by division and displays formatted as zero but actually is equal to a very small but non-zero number.

Super Contributor
Posts: 409

## Re: detect % of observations missing using BY

I'm stumped.. I treid everything.. it all makes sense, all the other variables by themselves are fine, except this Price..

could it be that Format = COMMA9. and Informat = BEST32.  ?

Contributor
Posts: 65

## Re: detect % of observations missing using BY

Gosh, I don't think so. I'm late for a meeting, but I'll reflect on it during the boring parts. In the meantime, what makes you think the results you're getting from the query are wrong?

Super Contributor
Posts: 409

## Re: detect % of observations missing using BY

I eyeballed the data and each of the observations has a Price listed, so it would be 100% and another product would be perhaps 90% , but am getting 0% for both..

🔒 This topic is solved and locked.