Solved: Re: How To - Create Count of Variable with Missing Value

turcay · Posted 12-02-2015 05:07 AM

Hello everyone,

I have a sample dataset. I want to see count of “Column1” Variable values. When I use PROC FREQ it creates what I want but I also want to see missing values as zero(0). I added my desired output. I’m not sure maybe I need to create PROC FORMAT or another procedure.

data have;
length Column1 8;
infile datalines missover dlm=",";
input Column1;
datalines;
2
4
5
2
4
5
2
4
5
;
Run;
Proc freq data=have;
table Column1 / out=want;
run;

Thank you.

FreelanceReinh · Posted 12-02-2015 07:46 AM

I think you are looking for an equivalent of PRELOADFMT for PROC FREQ. But to my knowledge this does not exist, in spite of numerous discussions, papers and SASware ballot entries about this blatantly missing feature.

In many cases you will have a format which contains all possible categories. (The format labels are irrelevant.) So, you can use (part of) the CNTLOUT dataset of that format for a left join with the PROC FREQ output dataset, as shown in the following example, based on your dataset WANT:

proc format;
value myfmt
1='1'
2='2'
3='3'
4='4'
5='5'
;
run;

proc format cntlout=fmt(keep=start);
select myfmt;
run;

proc sql;
create table want1 as
select input(start, 16.)     as column1,
       coalesce(a.count,0)   as count,
       coalesce(a.percent,0) as percent
from fmt left join want a
on a.column1=input(start, 16.);
quit;

As an alternative to the PROC SQL step, you can use one or two data steps: one for converting the character values of variable START to numeric values (this step could be omitted if the "COLUMN1" variable was character and the values of the renamed START variable matched COLUMN1 values) and one MERGE step as in the following example:

data numfmt;
set fmt;
column1=input(start, 16.);
drop start;
run;

data want1;
merge want(in=a)
      numfmt;
by column1;
if ~a then do;
  count=0;
  percent=0;
end;
run;

The advantage of the data step solution is that you work with the original variables of WANT. In particular, COUNT and PERCENT keep their original labels.

View solution in original post

RW9 · Posted 12-02-2015 05:51 AM

Check the documentation, you want and out=, and the nmiss option:

https://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_freq_se...

FreelanceReinh · Posted 12-02-2015 07:46 AM

I think you are looking for an equivalent of PRELOADFMT for PROC FREQ. But to my knowledge this does not exist, in spite of numerous discussions, papers and SASware ballot entries about this blatantly missing feature.

In many cases you will have a format which contains all possible categories. (The format labels are irrelevant.) So, you can use (part of) the CNTLOUT dataset of that format for a left join with the PROC FREQ output dataset, as shown in the following example, based on your dataset WANT:

proc format;
value myfmt
1='1'
2='2'
3='3'
4='4'
5='5'
;
run;

proc format cntlout=fmt(keep=start);
select myfmt;
run;

proc sql;
create table want1 as
select input(start, 16.)     as column1,
       coalesce(a.count,0)   as count,
       coalesce(a.percent,0) as percent
from fmt left join want a
on a.column1=input(start, 16.);
quit;

As an alternative to the PROC SQL step, you can use one or two data steps: one for converting the character values of variable START to numeric values (this step could be omitted if the "COLUMN1" variable was character and the values of the renamed START variable matched COLUMN1 values) and one MERGE step as in the following example:

data numfmt;
set fmt;
column1=input(start, 16.);
drop start;
run;

data want1;
merge want(in=a)
      numfmt;
by column1;
if ~a then do;
  count=0;
  percent=0;
end;
run;

The advantage of the data step solution is that you work with the original variables of WANT. In particular, COUNT and PERCENT keep their original labels.

Reeza · Posted 12-02-2015 10:57 AM

There are several ways and they're well documented in this paper.

http://www.lexjansen.com/nesug/nesug11/cc/cc29.pdf

An additional note - if the value is listed in any of the levels you can use the SPARSE option in proc freq. But if it's not available anywhere then you'll have to use a different method.

FreelanceReinh · Posted 12-02-2015 12:13 PM

@Reeza: Sure, the SPARSE option is great, if you have a cross tabulation and there are no rows or columns with only zero frequencies. Given @turcay's HAVE dataset, it would not help, even if one cross-tabulated Column1 with a dummy variable.

turcay · Posted 12-02-2015 01:54 PM

Hello everyone,

First of all , thank you for your helps. I checked all your advises and @FreelanceReinh's sample codes helped me and I created what I want. It is always good to see sample code to understand best. I'm not familiar with COALESCE function I checked this link https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002518172.htm and as far as I understand It returns numeric(specified) value instead of missing values. Your two method was beneficial for me but I used your first method.

Thanks eveyrbody.

Reeza · Posted 12-02-2015 02:55 PM

Coalesce will check a list of variables, it can be more than two and assigns the first non-missing value it encounters.

ie coalesce(var1, var2, var3)
if var1 & var2 are missing then it returns var3
if var1 is missing then it returns var2

turcay · Posted 12-02-2015 02:59 PM

Thank you @Reeza

Firstly, I couldn't understand well after I examined detailed I thought like your expression now I became clear.

Thanks again.

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!