BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
scify
Obsidian | Level 7

This feels like a simple question, but I haven't been able to find the answer anywhere (I'm probably just not wording it properly). Say I have the following data

 

ID         Type
1          A
1          A
1          B
1          A
2          A
2          A
3          B
3          B

And I've summarized it for PROC Report in the following way:

ID         Type    Count
1          A          3
1          B          1
2          A          2
3          B          2

Is there a way, either during the summarizing or the PROC, to have SAS impute the missing values of Type and set Count to 0? I.E.

ID         Type    Count
1          A          3
1          B          1
2          A          2
2          B          0
3          A          0
3          B          2

I know Type will always be either A or B, but not all IDs will have at least one instance of both, and for consistency in the PROC Report, I'd like to always have both rows.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You will generally have more control if you just write a program to do the counting and just use PROC REPORT or other tools to make the presentation.

 

So for this example where there is one variable with a fixed set of values that might not all be present you can first get the summary of values that are.  This will also produce the set of values for the other grouping variables that want to use as the basis for adding in the empty cells.

proc summary data=have nway ;
  class id name type ;
  output out=step1 ;
run;

So in this case we will get all of the ID*NAME combinations but some of them will only have some of the levels of TYPE.  So let's use that to build a dataset that has all of the possible levels for TYPE for all of the those combinations of ID and NAME and just set the count to zero.

data step2;
  set step1;
  by id name;
  if first.name;
  _freq_=0;
  do type='A','B';
     output;
  end;
run;

Now we can combine the two and get the full dataset. The counts that actually existed will overwrite the zeros we generated in STEP2.

data report;
   merge step2 step1;
   by id name type;
run;

Now we can make our report.

proc print data=report label;
  var id name type _freq_;
  label _freq_='Count';
run;
  

View solution in original post

15 REPLIES 15
scify
Obsidian | Level 7

That's definitely a start! It works for the simple example I posted above, but if I try to add another grouping column (say, Name,which has a one-to-one match with ID), it breaks rather spectacularly. Do you know if there's a way to make it work with three variables (ID, Name, and Type) instead of two?

andreas_lds
Jade | Level 19

You have to change the code summarising the data:

data work.Original;
   input ID Type $;
   datalines;
1          A
1          A
1          B
1          A
2          A
2          A
3          B
3          B
;

proc summary data=work.Original nway completetypes;
   class ID Type;
   output out= work.Counted(drop= _type_ rename=(_freq_ = count));
run;
Ksharp
Super User
data work.Original;
   input ID Type $;
   datalines;
1          A
1          A
1          B
1          A
2          A
2          A
3          B
3          B
;
proc freq data=Original;
table id*type/list sparse nocum nopercent;
run;
Tom
Super User Tom
Super User

Something like this?

data have ;
  input ID Type $ @@;
cards;
1 A 1 A
1 B 1 A
2 A 2 A
3 B 3 B
;

proc format ;
  value $types 'A'='A' 'B'='B';
run;

proc report data=have completerows;
  column id type n ;
  define id / group ;
  define type / group format=$types. PRELOADFMT ;
run;
scify
Obsidian | Level 7

Pretty much. I mentioned this above in my response to Reeza, but the issue I'm hitting now is that I've been asked to add another column (Name) which has a 1-to-1 match with ID, but is causing the Proc Report to go a little crazy in the output. Do you know if there's a way to handle that, other than dumping the entirety of Name into its own format and using preloadfmt there as well?

Reeza
Super User
If the data combination doesn't exist in the raw data you have to tell SAS somehow that it does exist and a format or classdata does that.
Or you pre-process the data, which may still require a format, classdata or SQL logic to mimic the functionality.

scify
Obsidian | Level 7
data original;
	input ID Name $ Type $;
	datalines;
1   Timmy     A
1   Timmy     A
1   Timmy     B
1   Timmy     A
2   Sarah     B
2   Sarah     B
3   John      A
3   John      A
;

proc sql;
	create table summary as
	select ID, Name, Type, count(*) as Count from original
	group by 1,2,3;
quit;

proc format;
	value $type 'A'='A' 'B'='B';

proc report data=summary completerows;
	column ID Name Type (Count);
	define ID / group  ;
	define Name / group;
	DEFINE Type / preloadfmt format=$type. exclusive;
	define Count / display;
run;

The above gives me this as the output:

ID Name Type Count
1 John   0
      0
      0
      0
  Sarah   0
      0
      0
      0
  Timmy A 3
    B 1
2 John   0
... etc      

 

 

 

Edit: And fixing my mistake of not labeling the Type column as a group variable produces this:

ID Name Type Count
1 John A 0
      0
      0
      0
    B 0
      0
      0
      0
  Sarah A 0
etc...      

 

Reeza
Super User

Yes, because COMPLETEROWS will essentially create a cartesian product between all your group variables . Which likely means you have missing TYPE somewhere in your data. If you cannot use the COMPLETEROWS option you have to use either CLASSDATA or PRELOADFMT ahead of time to structure your data for proc report. Or if you don't have ID as a group variable, as a display variable that would also work but then it would repeat.

PROC TABULATE with PRELOADFMT will probably give you what you need.

COMPLETEROWS
creates all possible combinations of the group variable values.

scify
Obsidian | Level 7

Oddly, it looks like the order and type of the variables is impacting things. If I change my proc report to this:

proc report data=summary completerows;
	column Name Type (ID Count);
	define Name / group;
	DEFINE Type / group preloadfmt format=$type. exclusive;
	define ID / 'ID' ;
	define Count / 'Count';
run;

it works fine. Unfortunately, this won't work with my actual data--if I define ID as character, it goes right back to being broken, and in my real dataset, it's a character variable due to leading 0s. It also, of course, starts using it in break / summarize sections, which isn't ideal...

Reeza
Super User
Try removing COMPLETEROWS if you're using PRELOADFMT, both are not necessary.
scify
Obsidian | Level 7

Removing COMPLETEROWS takes it back to the initial issue: it only displays the values of TYPE that exist for each ID/Name row, instead of having both.

Reeza
Super User
Show what you want from your Original data set please.
Tom
Super User Tom
Super User

You will generally have more control if you just write a program to do the counting and just use PROC REPORT or other tools to make the presentation.

 

So for this example where there is one variable with a fixed set of values that might not all be present you can first get the summary of values that are.  This will also produce the set of values for the other grouping variables that want to use as the basis for adding in the empty cells.

proc summary data=have nway ;
  class id name type ;
  output out=step1 ;
run;

So in this case we will get all of the ID*NAME combinations but some of them will only have some of the levels of TYPE.  So let's use that to build a dataset that has all of the possible levels for TYPE for all of the those combinations of ID and NAME and just set the count to zero.

data step2;
  set step1;
  by id name;
  if first.name;
  _freq_=0;
  do type='A','B';
     output;
  end;
run;

Now we can combine the two and get the full dataset. The counts that actually existed will overwrite the zeros we generated in STEP2.

data report;
   merge step2 step1;
   by id name type;
run;

Now we can make our report.

proc print data=report label;
  var id name type _freq_;
  label _freq_='Count';
run;
  

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 3147 views
  • 5 likes
  • 5 in conversation