Solved: Imputing missing values to 0 when grouping variable not present

scify · Posted 11-09-2021 04:47 PM

This feels like a simple question, but I haven't been able to find the answer anywhere (I'm probably just not wording it properly). Say I have the following data

ID         Type
1          A
1          A
1          B
1          A
2          A
2          A
3          B
3          B

And I've summarized it for PROC Report in the following way:

ID         Type    Count
1          A          3
1          B          1
2          A          2
3          B          2

Is there a way, either during the summarizing or the PROC, to have SAS impute the missing values of Type and set Count to 0? I.E.

ID         Type    Count
1          A          3
1          B          1
2          A          2
2          B          0
3          A          0
3          B          2

I know Type will always be either A or B, but not all IDs will have at least one instance of both, and for consistency in the PROC Report, I'd like to always have both rows.

Tom · Posted 11-10-2021 01:00 PM

You will generally have more control if you just write a program to do the counting and just use PROC REPORT or other tools to make the presentation.

So for this example where there is one variable with a fixed set of values that might not all be present you can first get the summary of values that are. This will also produce the set of values for the other grouping variables that want to use as the basis for adding in the empty cells.

proc summary data=have nway ;
  class id name type ;
  output out=step1 ;
run;

So in this case we will get all of the ID*NAME combinations but some of them will only have some of the levels of TYPE. So let's use that to build a dataset that has all of the possible levels for TYPE for all of the those combinations of ID and NAME and just set the count to zero.

data step2;
  set step1;
  by id name;
  if first.name;
  _freq_=0;
  do type='A','B';
     output;
  end;
run;

Now we can combine the two and get the full dataset. The counts that actually existed will overwrite the zeros we generated in STEP2.

data report;
   merge step2 step1;
   by id name type;
run;

Now we can make our report.

proc print data=report label;
  var id name type _freq_;
  label _freq_='Count';
run;

View solution in original post

Reeza · Posted 11-09-2021 05:00 PM

If you have all the values, look at the COMPLETEROWS option on PROC REPORT?

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/n1dz7jdasx5t56n1rmlx346dyk6n.htm

scify · Posted 11-10-2021 10:59 AM

That's definitely a start! It works for the simple example I posted above, but if I try to add another grouping column (say, Name,which has a one-to-one match with ID), it breaks rather spectacularly. Do you know if there's a way to make it work with three variables (ID, Name, and Type) instead of two?

andreas_lds · Posted 11-10-2021 01:46 AM

You have to change the code summarising the data:

data work.Original;
   input ID Type $;
   datalines;
1          A
1          A
1          B
1          A
2          A
2          A
3          B
3          B
;

proc summary data=work.Original nway completetypes;
   class ID Type;
   output out= work.Counted(drop= _type_ rename=(_freq_ = count));
run;

Ksharp · Posted 11-10-2021 06:51 AM

data work.Original;
   input ID Type $;
   datalines;
1          A
1          A
1          B
1          A
2          A
2          A
3          B
3          B
;
proc freq data=Original;
table id*type/list sparse nocum nopercent;
run;

Tom · Posted 11-10-2021 10:46 AM

Something like this?

data have ;
  input ID Type $ @@;
cards;
1 A 1 A
1 B 1 A
2 A 2 A
3 B 3 B
;

proc format ;
  value $types 'A'='A' 'B'='B';
run;

proc report data=have completerows;
  column id type n ;
  define id / group ;
  define type / group format=$types. PRELOADFMT ;
run;

scify · Posted 11-10-2021 11:12 AM

Pretty much. I mentioned this above in my response to Reeza, but the issue I'm hitting now is that I've been asked to add another column (Name) which has a 1-to-1 match with ID, but is causing the Proc Report to go a little crazy in the output. Do you know if there's a way to handle that, other than dumping the entirety of Name into its own format and using preloadfmt there as well?

Reeza · Posted 11-10-2021 11:39 AM

If the data combination doesn't exist in the raw data you have to tell SAS somehow that it does exist and a format or classdata does that.
Or you pre-process the data, which may still require a format, classdata or SQL logic to mimic the functionality.

scify · Posted 11-10-2021 11:52 AM

data original;
	input ID Name $ Type $;
	datalines;
1   Timmy     A
1   Timmy     A
1   Timmy     B
1   Timmy     A
2   Sarah     B
2   Sarah     B
3   John      A
3   John      A
;

proc sql;
	create table summary as
	select ID, Name, Type, count(*) as Count from original
	group by 1,2,3;
quit;

proc format;
	value $type 'A'='A' 'B'='B';

proc report data=summary completerows;
	column ID Name Type (Count);
	define ID / group  ;
	define Name / group;
	DEFINE Type / preloadfmt format=$type. exclusive;
	define Count / display;
run;

The above gives me this as the output:

ID	Name	Type	Count
1	John		0
			0
			0
			0
	Sarah		0
			0
			0
			0
	Timmy	A	3
		B	1
2	John		0
... etc

Edit: And fixing my mistake of not labeling the Type column as a group variable produces this:

ID	Name	Type	Count
1	John	A	0
			0
			0
			0
		B	0
			0
			0
			0
	Sarah	A	0
etc...

Reeza · Posted 11-10-2021 11:59 AM

Yes, because COMPLETEROWS will essentially create a cartesian product between all your group variables . Which likely means you have missing TYPE somewhere in your data. If you cannot use the COMPLETEROWS option you have to use either CLASSDATA or PRELOADFMT ahead of time to structure your data for proc report. Or if you don't have ID as a group variable, as a display variable that would also work but then it would repeat.

PROC TABULATE with PRELOADFMT will probably give you what you need.

COMPLETEROWS
creates all possible combinations of the group variable values.

scify · Posted 11-10-2021 12:33 PM

Oddly, it looks like the order and type of the variables is impacting things. If I change my proc report to this:

proc report data=summary completerows;
	column Name Type (ID Count);
	define Name / group;
	DEFINE Type / group preloadfmt format=$type. exclusive;
	define ID / 'ID' ;
	define Count / 'Count';
run;

it works fine. Unfortunately, this won't work with my actual data--if I define ID as character, it goes right back to being broken, and in my real dataset, it's a character variable due to leading 0s. It also, of course, starts using it in break / summarize sections, which isn't ideal...

Reeza · Posted 11-10-2021 12:50 PM

Try removing COMPLETEROWS if you're using PRELOADFMT, both are not necessary.

scify · Posted 11-10-2021 12:55 PM

Removing COMPLETEROWS takes it back to the initial issue: it only displays the values of TYPE that exist for each ID/Name row, instead of having both.

Reeza · Posted 11-10-2021 01:27 PM

Show what you want from your Original data set please.

Tom · Posted 11-10-2021 01:00 PM

You will generally have more control if you just write a program to do the counting and just use PROC REPORT or other tools to make the presentation.

So for this example where there is one variable with a fixed set of values that might not all be present you can first get the summary of values that are. This will also produce the set of values for the other grouping variables that want to use as the basis for adding in the empty cells.

proc summary data=have nway ;
  class id name type ;
  output out=step1 ;
run;

So in this case we will get all of the ID*NAME combinations but some of them will only have some of the levels of TYPE. So let's use that to build a dataset that has all of the possible levels for TYPE for all of the those combinations of ID and NAME and just set the count to zero.

data step2;
  set step1;
  by id name;
  if first.name;
  _freq_=0;
  do type='A','B';
     output;
  end;
run;

Now we can combine the two and get the full dataset. The counts that actually existed will overwrite the zeros we generated in STEP2.

data report;
   merge step2 step1;
   by id name type;
run;

Now we can make our report.

proc print data=report label;
  var id name type _freq_;
  label _freq_='Count';
run;

Imputing missing values to 0 when grouping variable not present

Re: Imputing missing values to 0 when grouping variable not present

Re: Imputing missing values to 0 when grouping variable not present

Re: Imputing missing values to 0 when grouping variable not present

Re: Imputing missing values to 0 when grouping variable not present

Re: Imputing missing values to 0 when grouping variable not present

Re: Imputing missing values to 0 when grouping variable not present

Re: Imputing missing values to 0 when grouping variable not present

Re: Imputing missing values to 0 when grouping variable not present

Re: Imputing missing values to 0 when grouping variable not present

Re: Imputing missing values to 0 when grouping variable not present

Re: Imputing missing values to 0 when grouping variable not present

Re: Imputing missing values to 0 when grouping variable not present

Re: Imputing missing values to 0 when grouping variable not present

Re: Imputing missing values to 0 when grouping variable not present

Re: Imputing missing values to 0 when grouping variable not present

SAS Innovate 2025: Call for Content

Classroom Training Available!