Solved: How do I include 0 counts for possible values that aren't in the data ...

Hello_there · Posted 12-28-2022 12:52 PM

Hi,

This is a data manipulation exercise. How do I include 0 counts for possible values that aren't in the data set?

The possible values are the colors of the rainbow, ROYGBIV. (red orange yellow green blue indigo violet). Currently indigo is missing.

Thanks

data have;
infile datalines dsd dlm=",";
	input color $;
datalines;
blue
red
red
blue 
green
red
orange
red
blue
violet
red
orange
yellow
;
run;

Desired output: (sort order doesn't matter)

PeterClemmensen · Posted 12-28-2022 02:19 PM

Try this

data have;
infile datalines dsd dlm=",";
input color $;
datalines;
blue   
red    
red    
blue   
green  
red    
orange 
red    
blue   
violet 
red    
orange 
yellow 
;

proc format;
 value $ col   "red"     = "red" 
               "orange"  = "orange" 
               "yellow"  = "yellow" 
               "green"   = "green" 
               "blue"    = "blue" 
               "indigo"  = "indigo" 
               "violet"  = "violet" 
;
run;

proc summary data = have nway completetypes;
   class color / preloadfmt order = formated missing;
   format color $col.;
   output out = want(drop = _TYPE_ rename = _FREQ_ = count);
run;

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

fja · Posted 12-28-2022 02:11 PM

Would you please have a look at this posting: https://communities.sas.com/t5/SAS-Programming/Return-count-of-0-in-a-Group-By-SQL-Statement/m-p/539...
This should be applicable in your case too ...

Hello_there · Posted 12-28-2022 05:34 PM

Thanks for replying, but this solution is slightly different because my data set does not include the possible value that I need.

This solution in the posted link appears to dynamically code the solution based on creating possible combinations from all distinct cross tabulated values of the two groups in the data set.

PeterClemmensen · Posted 12-28-2022 02:19 PM

Try this

data have;
infile datalines dsd dlm=",";
input color $;
datalines;
blue   
red    
red    
blue   
green  
red    
orange 
red    
blue   
violet 
red    
orange 
yellow 
;

proc format;
 value $ col   "red"     = "red" 
               "orange"  = "orange" 
               "yellow"  = "yellow" 
               "green"   = "green" 
               "blue"    = "blue" 
               "indigo"  = "indigo" 
               "violet"  = "violet" 
;
run;

proc summary data = have nway completetypes;
   class color / preloadfmt order = formated missing;
   format color $col.;
   output out = want(drop = _TYPE_ rename = _FREQ_ = count);
run;

The DATA to DATA Step Macro
Blog: SASnrd

Hello_there · Posted 12-28-2022 05:46 PM

Thanks!

andreas_lds · Posted 12-28-2022 03:35 PM

Using a format, as shown by @PeterClemmensen , is the recommended way, because just one more step is required.

fja · Posted 12-28-2022 04:51 PM

@andreas_lds wrote:

Using a format, as shown by @PeterClemmensen , is the recommended way, because just one more step is required.

I really do appreciate the two of you sharing your expertise here. ... but isn't the solution using a proc format not more like using a side effect? Isn't there any "official" way of addressing this issue?

PaigeMiller · Posted 12-28-2022 05:09 PM

In SAS, there are often many ways to get to the desired result. None of them are "official". One may be easier than another; one may require less code than another; one may execute faster than another.

--
Paige Miller

fja · Posted 12-29-2022 04:06 AM

@PaigeMiller wrote:

In SAS, there are often many ways to get to the desired result. None of them are "official". One may be easier than another; one may require less code than another; one may execute faster than another.

That is understandable ... could you name some advantages of the solution using "proc format"? This time just to help me ... 🙂 (as there is an accepted solution already).

FreelanceReinh · Posted 12-30-2022 10:56 AM

@fja wrote:
... could you name some advantages of the solution using "proc format"?

I think in many practical use cases, there is a suitable format already available and associated with the categorical variable in question. In this situation, neither the PROC FORMAT step nor the FORMAT statement as shown in PeterClemmensen's solution would be needed. (The existence of a suitable dataset for the CLASSDATA= option is probably less common.) Typically, this format would map numeric or short character codes to longer text descriptions of the categories.

A special feature of the PRELOADFMT approach is available when used in conjunction with the ORDER=DATA option of the CLASS statement (of PROC MEANS, PROC SUMMARY or PROC TABULATE): If the format definition used the NOTSORTED option of PROC FORMAT's VALUE statement, the order of categories in the output will match the order from the format definition -- regardless of the (alphabetic) order of the formatted values, the default order of the unformatted values, the order in which they occur in the input dataset (!) and their frequencies. This is very useful if the PROC FORMAT code was written in view of the output specifications (e.g., table shells in a statistical analysis plan).

FreelanceReinh · Posted 12-28-2022 04:47 PM

With PROC SUMMARY and PROC MEANS you can also use a CLASSDATA= dataset which contains (at least) the missing categories.

Simplified example (creating only printed output with a different header of the count column) using PROC MEANS:

data cd;
color='indigo  ';
run;

proc means data=have classdata=cd;
class color;
run;

(Note the two trailing blanks in 'indigo ' to make variable COLOR the same length as in dataset HAVE.)

Hello_there · Posted 12-28-2022 05:44 PM

Thanks, FreelanceReinhard!

This is a very clean way of doing it.

Edit: this post was edited to remove some follow up questions that i had.

Hello_there · Posted 12-28-2022 06:20 PM

I edited this post, but originally I had questions about how i would make this method more robust. Especially in particular cases where the data set was updated and the counts changed and if there was a way to macrotize the length value of the data set so i can use it for the cd data set. But after thinking about it some more, my use case involves knowing what the categories are beforehand and I would already know what the maximum length extends out to so it would be ok to hardcode that in the data set cd.

FreelanceReinh · Posted 12-29-2022 08:13 AM

@Hello_there wrote:
I edited this post, but originally I had questions about how i would make this method more robust. Especially in particular cases where the data set was updated and the counts changed and if there was a way to macrotize the length value of the data set so i can use it for the cd data set. But after thinking about it some more, my use case involves knowing what the categories are beforehand and I would already know what the maximum length extends out to so it would be ok to hardcode that in the data set cd.

As long as the CD dataset contains all CLASS variable values that would be absent otherwise, the approach should work regardless of changed counts in dataset HAVE. It wouldn't hurt if CD redundantly contained values which are present in HAVE as well.

There is no need to hardcode the length of variable COLOR in dataset CD as you can always retrieve it from dataset HAVE. (Edit: ... assuming that the length there is also sufficient to accommodate the new values in CD.)

Example:

data cd;
if 0 then set have(keep=color);
input color;
cards;
indigo
;

Hello_there · Posted 12-29-2022 08:24 AM

Thanks, again.

Could you explain what is 0 in the "if 0 then set have (keep=color);" part?

How do I include 0 counts for possible values that aren't in the data set?

Re: How do I include 0 counts for possible values that aren't in the data set?

Re: How do I include 0 counts for possible values that aren't in the data set?

Re: How do I include 0 counts for possible values that aren't in the data set?

Re: How do I include 0 counts for possible values that aren't in the data set?

Re: How do I include 0 counts for possible values that aren't in the data set?

Re: How do I include 0 counts for possible values that aren't in the data set?

Re: How do I include 0 counts for possible values that aren't in the data set?

Re: How do I include 0 counts for possible values that aren't in the data set?

Re: How do I include 0 counts for possible values that aren't in the data set?

Re: How do I include 0 counts for possible values that aren't in the data set?

Re: How do I include 0 counts for possible values that aren't in the data set?

Re: How do I include 0 counts for possible values that aren't in the data set?

Re: How do I include 0 counts for possible values that aren't in the data set?

Re: How do I include 0 counts for possible values that aren't in the data set?

Re: How do I include 0 counts for possible values that aren't in the data set?

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away