turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- detect % of observations missing using BY

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-21-2012 12:14 PM

Hi, Is there a way I can detect the % of the number of observations that have a valid (not missing or >0) value in a series of variables and using a By grouping.. Here's what I mean..

HAVE

**Store** **Product** **Var1** **Var2** **Var3** **Var4**

A A1 5 **.** 6 3

A A1 7 6 **0** 9

A A1 6 **. ** 4 .

A A2 **. ** 5 3 3

A A2 5 4 7 5

B B1 3 **.** **0** 8

B B1 **.** 5 6 7

B B2 **. ** 5 8 9

Now if we look at the above data and see that for each Sore and Product, I want to see how many

. and 0's there are for only the Var1 to Var3 in total.. So what I'd like to see is:

**Store** **Product**** ****%-missing**

A A1 33.33% <- 3/9 - highlighted in red

A A2 16.66% <- 1/6 - highlighted in blue

B B1 50% <- 3/6 - highlighted in green

B B2 33.33% <- 1/3 - highlighted in purple

Thanks

Accepted Solutions

Solution

03-21-2012
01:36 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-21-2012 01:36 PM

This is not a very elegant solution as I had to physically set 0's to missing and hardcode the number of variables (3), but if quick and dirty will do. . .

data have (drop = i);

array v (*) v1-v4;

input store $ 1 product $ 3-4 v1-v4;

do i = 1 to dim(v); if v(i) = 0 then v(i) = .; end;

datalines;

A A1 5 . 6 3

A A1 7 6 0 9

A A1 6 . 4 .

A A2 . 5 3 3

A A2 5 4 7 5

B B1 3 . 0 8

B B1 . 5 6 7

B B2 . 5 8 9

;;;;

run;

proc sql;

create table want as

select store,product,(nmiss(v1)+nmiss(v2)+nmiss(v3)) / (count(*)*3) as result

from have

group by store,product

;

quit;

Oh, and result is a proportion not a percent so you'd need to format it how you'd like to see it.

Karl

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-21-2012 12:51 PM

It would take more than one step because you're looking at multiple variables and by groups.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-21-2012 01:16 PM

Why do the calculations use Var1, Var2, and Var3, but ignore Var4?

It's not hard, I just want to know the rules first.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-21-2012 01:39 PM

Var4 is just a variable that doesn't have any data that's meaningful, or at least that needs to be considered in my analysis..

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-21-2012 01:55 PM

OK, using only var1-var3 then:

data want;

set have;

by store product;

if first.product then do;

n_missing=0;

n_nonmissing=0;

end;

array keyvars {3} var1-var3;

do _n_=1 to 3;

if keyvars{_n_} in (0, .) then n_missing + 1;

else n_nonmissing + 1;

end;

if last.product;

pct_missing = 100 * n_missing / (n_missing + n_nonmissing);

keep store product pct_missing /* possibly keep n_missing and n_nonmissing as well */;

run;

Good luck.

Solution

03-21-2012
01:36 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-21-2012 01:36 PM

This is not a very elegant solution as I had to physically set 0's to missing and hardcode the number of variables (3), but if quick and dirty will do. . .

data have (drop = i);

array v (*) v1-v4;

input store $ 1 product $ 3-4 v1-v4;

do i = 1 to dim(v); if v(i) = 0 then v(i) = .; end;

datalines;

A A1 5 . 6 3

A A1 7 6 0 9

A A1 6 . 4 .

A A2 . 5 3 3

A A2 5 4 7 5

B B1 3 . 0 8

B B1 . 5 6 7

B B2 . 5 8 9

;;;;

run;

proc sql;

create table want as

select store,product,(nmiss(v1)+nmiss(v2)+nmiss(v3)) / (count(*)*3) as result

from have

group by store,product

;

quit;

Oh, and result is a proportion not a percent so you'd need to format it how you'd like to see it.

Karl

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-21-2012 02:17 PM

Thanks Karl.. great results..

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-21-2012 02:25 PM

Karl, I'm getting some weird answers when I'm trying only one variable (such as Price)...

select Store,Product,(nmiss(Price)) / (count(*)***1) as result**

as I'd like to see how much of that field is being populated throughout the Stores/Products..

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-21-2012 02:36 PM

Hmm, I ran the code on your existing v1-v3 from the original post as

select store,product,nmiss(v1) / count(*) as result

from have

group by store,product;

and got the correct answers. (I removed the extraneous parens and the "*1", but it shouldn't make any difference.)

Did you remember to recode 0's as missing on your Price variable before running the SQL query? If not, maybe you could post some sample data that shows the problem and I'll take another look.

Karl

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-21-2012 02:43 PM

I ran the same code and got the correct results on the test data, but seem sshort on the main data.. I did change the 0's to . ..

I'll try again..

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-21-2012 02:53 PM

Just to make sure that I'm doing this correctly.. I have 13 varaibles that I need to convert 0's to missing.. so I used this:

array v (*) Price--NHIP;

do i = 1 to dim(v);if v(i) = 0 then v(i) = .;end;

and then;

select Store, Product, (nmiss(Price)) / Count(*)*1) as result

Does this look correct ?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-21-2012 03:01 PM

Yes, it looks right (assuming the "group by" is in your actual query). As I said, you can drop the *1 if there's only one variable, but I tried it with and without just to convince myself.

If the results for Price don't "smell" right to you, you might try doing a proc freq on Price, before and after the recode to missing and see if there's something funky going on. Can happen if price is calculated by division and displays formatted as zero but actually is equal to a very small but non-zero number.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-21-2012 03:13 PM

I'm stumped.. I treid everything.. it all makes sense, all the other variables by themselves are fine, except this Price..

could it be that Format = COMMA9. and Informat = BEST32. ?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-21-2012 03:16 PM

Gosh, I don't think so. I'm late for a meeting, but I'll reflect on it during the boring parts. In the meantime, what makes you think the results you're getting from the query are wrong?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-21-2012 03:19 PM

I eyeballed the data and each of the observations has a Price listed, so it would be 100% and another product would be perhaps 90% , but am getting 0% for both..