BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ttatum
Calcite | Level 5

Hi,

I have a dataset like the following:

CompanyVAR1VAR2VAR3VAR4VAR5
Company1028000
Company24001219
Company37065525
Company4802307
Company51015103452

The dataset has a total of 160 variables, of which I want to find summary statistics for each variable. However, I first want to determine the number of non-zero observations for each variable. To do this with a small number of variables I would probably do the following:

Proc sql;

create table want as

  select *,

    sum(case when VAR1 ne 0 then 1 else 0 end) as count1,

    sum(case when VAR2 ne 0 then 1 else 0 end) as count2

from have;

Quit;

However, this is code is not as useful when I have 160 variables I want to get counts for. Is there an easier way this can be done? I assume this can be done with proc freq or proc summary, but am not very familiar with these procs. Keep in mind I also want to keep the values for each variable as I want to eventually gather summary statistics for each variable. Eventually I would want my final dataset to look something like this:

VariableNumber of non-zero observations for each variableMinMaxMeanMedian
Var14
Var22
Var33
Var43
Var54
Var6...etc.
Var7...etc.

I can probably handle the transposing and re-arranging the data from a proc means output, so I was mostly wondering what the best way to get the count of non-zero observations for each variable.

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

OK, in that case ...

There is little to change about the PROC MEANS.  It can run on your original data set, but does not require the N statistic as part of the output.

PROC FREQ can generate the counts, but it would probably be easier to use a DATA step:

data want;

   set have end=done;

   array in {160} var1-var160;

   array out {160} count1-count160;

   retain count1-count160 0;

   do _i_=1 to 160;

      if in{_i_} not in (., 0) then out{_i_} + 1;

   end;

   if done;

   keep count1-count160;

run;

So already the code gets a bit complex.  Then there is the requirement to combine both sets of results, which uses perhaps more complex tools.  How comfortable are you with complex code?  Would you prefer to drop the requirement about just counting the non-zeros?  Either path is possible.

View solution in original post

6 REPLIES 6
Astounding
PROC Star

There are ways to do this, but first you have to clarify what you want.  Are the Min, Max, Mean, and Median supposed to be based on the non-zero values only?  That's the easiest case of all.  Just change all zeros to missing values:

data want;

set have;

array vars {160} var1-var160;

do _i_=1 to 160;

  if vars{_i_}=0 then vars{_i_}=.;

end;

run;

proc means data=want n min max mean median;

var var1-var160;

run;

If you want statistics to include all the zeros, that will take a little more work, but is still fairly easy.  So what are you looking for?

Good luck.

ttatum
Calcite | Level 5

Thank you for the response. I would like to include any zero or negative values in my summary statistics, but have a count of how many observations were not equal to zero for each variable.

Astounding
PROC Star

OK, in that case ...

There is little to change about the PROC MEANS.  It can run on your original data set, but does not require the N statistic as part of the output.

PROC FREQ can generate the counts, but it would probably be easier to use a DATA step:

data want;

   set have end=done;

   array in {160} var1-var160;

   array out {160} count1-count160;

   retain count1-count160 0;

   do _i_=1 to 160;

      if in{_i_} not in (., 0) then out{_i_} + 1;

   end;

   if done;

   keep count1-count160;

run;

So already the code gets a bit complex.  Then there is the requirement to combine both sets of results, which uses perhaps more complex tools.  How comfortable are you with complex code?  Would you prefer to drop the requirement about just counting the non-zeros?  Either path is possible.

ttatum
Calcite | Level 5

Thank you. This should work. I can probably get my data into the final desired format from the code provided by you and ballerdw + Jag.

Jagadishkatam
Amethyst | Level 16

I believe   has provided you the right answer. The below will generate the output similar to but only nonzero obs count. Hope it helps

Please try

data have;

input Company$    VAR1    VAR2    VAR3    VAR4    VAR5;

cards;

Company1    0    28    0    0    0

Company2    4    0    0    12    19

Company3    7    0    6    55    25

Company4    8    0    23    0    7

Company5    10    15    10    34    52

;

proc transpose data=have;

run;

data want;

set data1;

array cnt col1-col99;

array cnts cols1-cols99;

do over cnt;

if cnt in (0,.) then cnts=1;

else cnts=.;

allcnts=nmiss(of cols1-cols99);

end;

drop col: cols:;

run;

Thanks,

Jag

Thanks,
Jag
ballardw
Super User

And by non- zero do you have an negative values to consider?

If you want a count one way is a custom format and proc freq/report/tabulate

proc format;

value nonzero

0='Zero'

0<-high = 'Nonzero'  /* assumes all the non-zero values of interest are positive*/

;

run;

proc freq data= have;

     tables _numeric_ ;

     format _numeric_ nonzero.;

run;

Or any list of variables will work.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 14381 views
  • 8 likes
  • 4 in conversation