BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
podarum
Quartz | Level 8

Hi, Is there a way I can detect the % of the number of observations that have a valid (not missing or >0) value in a series of variables and using a By grouping.. Here's what I mean..

HAVE

Store   Product    Var1   Var2   Var3    Var4

  A           A1           5       .         6          3

  A           A1           7        6        0          9

  A           A1           6        .         4          .

  A           A2           .        5        3          3

  A           A2           5        4         7         5

  B           B1           3       .         0          8

  B           B1          .        5         6          7

  B           B2            .        5         8          9

Now if we look at the above data and see that for each Sore and Product, I want to see how many

. and 0's there are for only the Var1 to Var3 in total..  So what I'd like to see is:

Store   Product   %-missing

  A           A1            33.33%   <- 3/9 - highlighted in red

  A           A2            16.66%   <- 1/6 - highlighted in blue

  B           B1             50%       <- 3/6 - highlighted in green

  B           B2             33.33%   <- 1/3 - highlighted in purple

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
KarlK
Fluorite | Level 6

This is not a very elegant solution as I had to physically set 0's to missing and hardcode the number of variables (3), but if quick and dirty will do. . .

data have (drop = i);

array v (*) v1-v4;

input store $ 1 product $ 3-4 v1-v4;

do i = 1 to dim(v); if v(i) = 0 then v(i) = .; end;

datalines;

A A1 5 . 6 3

A A1 7 6 0 9

A A1 6 . 4 .

A A2 . 5 3 3

A A2 5 4 7 5

B B1 3 . 0 8

B B1 . 5 6 7

B B2 . 5 8 9

;;;;

run;

proc sql;

create table want as

select store,product,(nmiss(v1)+nmiss(v2)+nmiss(v3)) / (count(*)*3) as result

from have

group by store,product

;

quit;

Oh, and result is a proportion not a percent so you'd need to format it how you'd like to see it.

Karl

View solution in original post

17 REPLIES 17
Reeza
Super User

It would take more than one step because you're looking at multiple variables and by groups.

Astounding
PROC Star

Why do the calculations use Var1, Var2, and Var3, but ignore Var4?

It's not hard, I just want to know the rules first.

podarum
Quartz | Level 8

Var4 is just a variable that doesn't have any data that's meaningful, or at least that needs to be considered in my analysis..

Astounding
PROC Star

OK, using only var1-var3 then:

data want;

   set have;

   by store product;

   if first.product then do;

     n_missing=0;

     n_nonmissing=0;

   end;

   array keyvars {3} var1-var3;

   do _n_=1 to 3;

       if keyvars{_n_} in (0, .) then n_missing + 1;

       else n_nonmissing + 1;

   end;

   if last.product;

   pct_missing = 100 * n_missing / (n_missing + n_nonmissing);

   keep store product pct_missing   /* possibly keep n_missing and n_nonmissing as well */;

run;

Good luck.

KarlK
Fluorite | Level 6

This is not a very elegant solution as I had to physically set 0's to missing and hardcode the number of variables (3), but if quick and dirty will do. . .

data have (drop = i);

array v (*) v1-v4;

input store $ 1 product $ 3-4 v1-v4;

do i = 1 to dim(v); if v(i) = 0 then v(i) = .; end;

datalines;

A A1 5 . 6 3

A A1 7 6 0 9

A A1 6 . 4 .

A A2 . 5 3 3

A A2 5 4 7 5

B B1 3 . 0 8

B B1 . 5 6 7

B B2 . 5 8 9

;;;;

run;

proc sql;

create table want as

select store,product,(nmiss(v1)+nmiss(v2)+nmiss(v3)) / (count(*)*3) as result

from have

group by store,product

;

quit;

Oh, and result is a proportion not a percent so you'd need to format it how you'd like to see it.

Karl

podarum
Quartz | Level 8

Thanks Karl.. great results..

podarum
Quartz | Level 8

Karl, I'm getting some weird answers when I'm trying only one variable (such as Price)...

select Store,Product,(nmiss(Price)) / (count(*)*1) as result

as I'd like to see how much of that field is being populated throughout the Stores/Products..

KarlK
Fluorite | Level 6

Hmm, I ran the code on your existing v1-v3 from the original post as

select store,product,nmiss(v1) / count(*) as result

from have

group by store,product;

and got the correct answers. (I removed the extraneous parens and the "*1", but it shouldn't make any difference.)

Did you remember to recode 0's as missing on your Price variable before running the SQL query? If not, maybe you could post some sample data that shows the problem and I'll take another look.

Karl

podarum
Quartz | Level 8

I ran the same code and got the correct results on the test data, but seem sshort on the main data.. I did change the 0's to . ..

I'll try again..

podarum
Quartz | Level 8

Just to make sure that I'm doing this correctly.. I have 13 varaibles that I need to convert 0's to missing.. so I used this:

array v (*) Price--NHIP;

do i = 1 to dim(v);if v(i) = 0 then v(i) = .;end;

and then;

select Store, Product, (nmiss(Price)) / Count(*)*1) as result

Does this look correct ?

KarlK
Fluorite | Level 6

Yes, it looks right (assuming the "group by" is in your actual query). As I said, you can drop the *1 if there's only one variable, but I tried it with and without just to convince myself.

If the results for Price don't "smell" right to you, you might try doing a proc freq on Price, before and after the recode to missing and see if there's something funky going on. Can happen if price is calculated by division and displays formatted as zero but actually is equal to a very small but non-zero number.

podarum
Quartz | Level 8

I'm stumped.. I treid everything.. it all makes sense, all the other variables by themselves are fine, except this Price..

could it be that Format = COMMA9. and Informat = BEST32.  ?

KarlK
Fluorite | Level 6

Gosh, I don't think so. I'm late for a meeting, but I'll reflect on it during the boring parts. In the meantime, what makes you think the results you're getting from the query are wrong?

podarum
Quartz | Level 8

I eyeballed the data and each of the observations has a Price listed, so it would be 100% and another product would be perhaps 90% , but am getting 0% for both..

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 1308 views
  • 0 likes
  • 5 in conversation