☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Obsidian | Level 7

need help creating a summary

I have data that I need to convert to a summary with counts for rule (the number or records in each rule by store) and also a % of match by the bucket value.

I am do not know how to convert to summary like the example below.  I have tried proc tabulate and proc transpose but have not been able to get the desired results.

any assistance will be greatly appreciated.

Thanks, Elliott

I need it to look like this with counts for rule level and number mismatch and total

the on the right the percent of bucket aging match/mismatch

Want:

 Count by matching Rule Level % of match by aging bucket Store# 1 2 3 4 5 6 Mis-match Total 0-5 6-15 16-30 31-60 >60 Mismatch 00070 00144 00145 00163 00175 00251 00301

Data I have looks like this:

Store Rule bktdays match bucket
00004 NOMATCH 0-5
00004 2 3 MATCH 0-5
00011 4 -75 MATCH 0-5
00014 4 -10 MATCH 0-5
00014 1 8 MATCH 6-15
00014 4 -42 MATCH 0-5
00014 4 -45 MATCH 0-5
00014 4 -47 MATCH 0-5
00058 3 0 MATCH 0-5
00070 5 -1 MATCH 0-5
00070 3 0 MATCH 0-5
00070 4 -2 MATCH 0-5
00070 3 0 MATCH 0-5
00070 1 0 MATCH 0-5
00070 4 -9 MATCH 0-5
00070 1 5 MATCH 0-5
00070 4 -14 MATCH 0-5
00070 2 2 MATCH 0-5
00070 4 -18 MATCH 0-5
00070 3 0 MATCH 0-5
00070 4 -21 MATCH 0-5
00070 1 0 MATCH 0-5
00070 4 -25 MATCH 0-5
00070 4 -25 MATCH 0-5
00070 3 3 MATCH 0-5
00070 1 1 MATCH 0-5
00070 4 -28 MATCH 0-5
00070 4 -30 MATCH 0-5
00070 3 2 MATCH 0-5
00070 4 0 MATCH 0-5
00070 4 -42 MATCH 0-5
00070 4 -48 MATCH 0-5
00070 1 0 MATCH 0-5

data have;
infile datalines dlm=',' dsd;
length Store \$5 Rule 1. bktdays8. match \$10 bucket \$5;
input store match bucket \$ rule 1. bktdays 8.;
datalines;
00004,,,,NOMATCH,0-5
00004,,2,3,MATCH,0-5
00011,,4,-75,MATCH,0-5
00014,,4,-10,MATCH,0-5
00014,,1,8,MATCH,6-15
00014,,4,82,MATCH,61-90
00014,,4,45,MATCH,31-60
00014,4,47,MATCH,31-60
00058,3,0,MATCH,0-5
00070,5,-1,MATCH,0-5
00070,3,0,MATCH,0-5
00070,4,-2,MATCH,0-5
00070,3,0,MATCH,0-5
00070,1,0,MATCH,0-5
00070,4,-9,MATCH,0-5
00070,1,5,MATCH,0-5
00070,,-14,NOMATCH,0-5
00070,2,2,MATCH,0-5
00070,4,-18,MATCH,0-5
00072,3,0,MATCH,0-5
00072,4,21,MATCH,16-30
00073,1,0,MATCH,0-5
00073,4,25,MATCH,16-30
00073,4,25,MATCH,16-30
00074,3,3,MATCH,0-5
00074,1,1,MATCH,0-5
00074,4,58,MATCH,31-60
00075,4,30,MATCH,16-30
00075,3,2,MATCH,0-5
00075,4,0,MATCH,0-5
00076,4,92,MATCH,>90
00076,4,48,MATCH,31-60
00076,1,0,MATCH,0-5
00077,4,52,MATCH,31-60
;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
PROC Star

Re: need help creating a summary - Updated information

The below does what I think you want, except for the rightmost  column.

Assuming that there is a 1:1 equivalence between RULE being missing and a Mismatch status, then just make rule=99 for mis-match cases.  To make rule 99's print out as "mismatch" in your table, just assign a user-define format for RULE.

The match percentage by age bucket can be seen as the average of a variable  MATCH_PCT which takes on only two values: 1 (for matches) and 0 (mis-matches).  Ask tabulate to report the mean of those 1's and 0's, and apply then PERCENT. format to that mean.

What is the rightmost column?  It's listed as part of a column group titled "% of Match by aging Bucket", yet that column is not an age group.  And it's not even a Match %, since it is headed "mismatch".

``````data need / view=need;
set work.summary_data ;

/*Put a leading space in front of '0-5' and '6-15' so that
lexicographic order replicates the intended age group order*/
if bucket='0-5' then bucket=' 0-5';
else if bucket='6-15' then bucket=' 6-15';

if rule=. then do;
rule=99 ; /*Use for mismatch*/
match_pct=0;
end;
else match_pct=1;

/*Right justify STORE variable to get expected sort order*/
store=right(store);
run ;

proc format ;
value rulefmt   99='MisMatch'  other=[1.] ;
run;

proc tabulate data =need;
class rule store bucket ;
var match_pct;
table   store
, rule*n=' '*f=comma6.0
all='Total'*n=' '*f=comma6.0
match_pct='% of match by aging bucket'*bucket=' '*mean=' '*f=percent6.4 ;
format rule rulefmt. ;
run;
``````

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
13 REPLIES 13
PROC Star

Re: need help creating a summary

Please revise your DATA HAVE step into a working piece of code.  Among other issues, your datalines do not all have the same number of csv value (i.e. they don't have the same number of commas).  There are other errors too that make it a bit ambiguous as to what advice to give you.

Please fix so that it runs.

And then post it via the "running man" icon.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Obsidian | Level 7

Re: need help creating a summary

here is the corrected data lines:

``````data have;

infile datalines dlm=',' dsd;

length Store \$5 Rule \$1 bktdays 8. match \$10 bucket \$5;

input store rule \$ bktdays match \$ bucket \$ ;

datalines;

00004,,0,NOMATCH,0-5

00004,2,3,MATCH,0-5

00011,4,-75,MATCH,0-5

00014,4,-10,MATCH,0-5

00014,1,8,MATCH,6-15

00014,4,82,MATCH,61-90

00014,4,45,MATCH,31-60

00014,4,47,MATCH,31-60

00058,3,0,MATCH,0-5

00070,5,-1,MATCH,0-5

00070,3,0,MATCH,0-5

00070,4,-2,MATCH,0-5

00070,3,0,MATCH,0-5

00070,1,0,MATCH,0-5

00070,4,-9,MATCH,0-5

00070,1,5,MATCH,0-5

00070,,-14,NOMATCH,0-5

00070,2,2,MATCH,0-5

00070,4,-18,MATCH,0-5

00072,3,0,MATCH,0-5

00072,4,21,MATCH,16-30

00073,1,0,MATCH,0-5

00073,4,25,MATCH,16-30

00073,4,25,MATCH,16-30

00074,3,3,MATCH,0-5

00074,1,1,MATCH,0-5

00074,4,58,MATCH,31-60

00075,4,30,MATCH,16-30

00075,3,2,MATCH,0-5

00075,4,0,MATCH,0-5

00076,4,92,MATCH,>90

00076,4,48,MATCH,31-60

00076,1,0,MATCH,0-5

00077,4,52,MATCH,31-60

;

run;``````
Diamond | Level 26

Re: need help creating a summary

I just tried your code, it does not work. Please test the code and confirm that it works.

--
Paige Miller
Obsidian | Level 7

Re: need help creating a summary

update to code has already been posted, sorry for the inconvenience
PROC Star

Re: need help creating a summary

What does "mismatch" mean?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Obsidian | Level 7

Re: need help creating a summary

the initial data before it was reduced to this data was to match record from 2 different data sets, the match/nomatch indicates if a match was found.
Super User

Re: need help creating a summary

You need to provide a bit of information as to exactly what "TOTAL" and "% of match by the bucket value" means. I might guess that Total is a count of records for the store but in case it is something else. But the requested percentage needs a clear definition of what each numerator and denominator used should be especially since you seem to imply that MISMATCH is a category under Bucket when it is a different variable.

I suspect that your data is in for some significant reformatting and/or additional variables to get a report like that.

Obsidian | Level 7

Re: need help creating a summary

here are some more details that may help, this is an image of the sample the report requestor provided:

Column H: is the number of mismatch or nomatch records for that store number.

Column J: %mismatch =H18/I18

Column I:  Total of (both match and mismatched) is the number of records for 1-6 + #of mismatch

Starting at column K:

% of matched by aging bucket?  - 7 matched out of 35 (sum of 1-6=7)/total? 7/35=.2 or 20%

In the original sample I provided the format was a little different.  but I think they just moved the mismatch% to the end of the buckets.

Thank you all for your help with this

Obsidian | Level 7

need help creating a summary - Updated information

I have worked on this all day and am making some progress but still need some help.

``````data work.summary_data;
infile datalines dlm=',' dsd truncover;
length PendingNo \$9 CrUnit \$5 Store \$5 match \$10 bucket \$5 Rulechar \$1 Rule 3 ;
input PendingNo CrUnit Store match bucket Rulechar \$ Rule 1.  ;
datalines;
253864481,4730,4,MATCH,0-5,2,2
212777089,4730,4,NOMATCH,0-5,,
301172842,4730,11,MATCH,0-5,4,4
281739414,4730,14,MATCH,6-15,1,1
261300392,4730,14,MATCH,0-5,4,4
288475892,4730,14,MATCH,0-5,4,4
313076925,4730,14,MATCH,0-5,4,4
23131926,4730,14,MATCH,0-5,4,4
273107812,4730,58,MATCH,0-5,3,3
226270010,4730,70,MATCH,0-5,1,1
276177164,4730,70,MATCH,0-5,1,1
92644582,4730,70,MATCH,0-5,1,1
137788903,4730,70,MATCH,0-5,1,1
63797089,4730,70,MATCH,0-5,1,1
114514092,4730,70,MATCH,6-15,1,1
159810023,4730,70,MATCH,0-5,1,1
239028597,4730,70,MATCH,6-15,1,1
291638142,4730,70,MATCH,0-5,1,1
306576926,4730,70,MATCH,0-5,2,2
184156646,4730,70,MATCH,0-5,3,3
214132768,4730,70,MATCH,0-5,3,3
51683419,4730,70,MATCH,0-5,3,3
137563134,4730,70,MATCH,0-5,3,3
217461698,4730,70,MATCH,0-5,3,3
109113011,4730,70,MATCH,0-5,3,3
149173652,4730,70,MATCH,0-5,3,3
154079135,4730,70,MATCH,0-5,3,3
189377914,4730,70,MATCH,0-5,3,3
204766847,4730,70,MATCH,0-5,3,3
189066497,4730,70,MATCH,0-5,4,4
251583641,4730,70,MATCH,0-5,4,4
306428966,4730,70,MATCH,0-5,4,4
46074769,4730,70,MATCH,0-5,4,4
72116021,4730,70,MATCH,0-5,4,4
112755683,4730,70,MATCH,0-5,4,4
112963673,4730,70,MATCH,0-5,4,4
147893714,4730,70,MATCH,0-5,4,4
162142576,4730,70,MATCH,0-5,4,4
233991350,4730,70,MATCH,0-5,4,4
283368603,4730,70,MATCH,0-5,4,4
38519097,4730,70,MATCH,0-5,4,4
78059359,4730,70,MATCH,0-5,4,4
104160371,4730,130,MATCH,0-5,4,4
119392631,4730,130,MATCH,0-5,4,4
184920636,4730,130,MATCH,0-5,4,4
81757207,4730,130,MATCH,0-5,4,4
96275766,4730,130,MATCH,0-5,4,4
96633756,4730,130,MATCH,0-5,4,4
214350928,4730,139,MATCH,0-5,1,1
13148566,4730,139,MATCH,0-5,3,3
164847505,4730,139,MATCH,0-5,4,4
184356706,4730,139,MATCH,0-5,4,4
67873620,4730,139,MATCH,0-5,4,4
72534031,4730,139,MATCH,0-5,4,4
102649143,4730,139,MATCH,0-5,4,4
182921667,4730,139,MATCH,0-5,4,4
207836158,4730,139,MATCH,0-5,4,4
238375000,4730,139,MATCH,0-5,4,4
278454232,4730,139,MATCH,0-5,4,4
298747613,4730,139,MATCH,0-5,4,4
251074531,4730,139,NOMATCH,0-5,,
293309254,4730,139,NOMATCH,0-5,,
204006727,4730,139,NOMATCH,0-5,,
106813447,4730,139,NOMATCH,0-5,,
43030128,4730,140,MATCH,6-15,1,1
291264232,4730,140,MATCH,0-5,1,1
296210900,4730,140,MATCH,0-5,2,2
99259380,4730,140,MATCH,16-30,3,3
51328329,4730,143,MATCH,16-30,1,1
;
run;

Proc tabulate data=summary_data;
title  "Rule Analysis";
class store  /missing;
class PendingNo rulechar bucket /missing;
var rule;
table store, rulechar * bucket
(all='Total Records' rule=''*(n="# of Matches"))
/misstext=' '
;
run;This produces part of what I need, I am still needing to know how to create the % of mismatches and also I would like Rules 1 - 6 on the left then total then buckets then %mismatch.This is what the current output looks like now, but I would like to have it look like what is belowThis is what the query creates, but.This is what I am trying to get to``````
Obsidian | Level 7

Re: need help creating a summary - Updated information

here are images of what I have and below is what I am trying to get to

.current query produceswhere I am trying to get

PROC Star

Re: need help creating a summary - Updated information

The below does what I think you want, except for the rightmost  column.

Assuming that there is a 1:1 equivalence between RULE being missing and a Mismatch status, then just make rule=99 for mis-match cases.  To make rule 99's print out as "mismatch" in your table, just assign a user-define format for RULE.

The match percentage by age bucket can be seen as the average of a variable  MATCH_PCT which takes on only two values: 1 (for matches) and 0 (mis-matches).  Ask tabulate to report the mean of those 1's and 0's, and apply then PERCENT. format to that mean.

What is the rightmost column?  It's listed as part of a column group titled "% of Match by aging Bucket", yet that column is not an age group.  And it's not even a Match %, since it is headed "mismatch".

``````data need / view=need;
set work.summary_data ;

/*Put a leading space in front of '0-5' and '6-15' so that
lexicographic order replicates the intended age group order*/
if bucket='0-5' then bucket=' 0-5';
else if bucket='6-15' then bucket=' 6-15';

if rule=. then do;
rule=99 ; /*Use for mismatch*/
match_pct=0;
end;
else match_pct=1;

/*Right justify STORE variable to get expected sort order*/
store=right(store);
run ;

proc format ;
value rulefmt   99='MisMatch'  other=[1.] ;
run;

proc tabulate data =need;
class rule store bucket ;
var match_pct;
table   store
, rule*n=' '*f=comma6.0
all='Total'*n=' '*f=comma6.0
match_pct='% of match by aging bucket'*bucket=' '*mean=' '*f=percent6.4 ;
format rule rulefmt. ;
run;
``````

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Obsidian | Level 7

Re: need help creating a summary - Updated information

Thank you this is wonderful, as for the last column it is supposed to be the % of mismatches.  I need to add that.

Also, how would the code change if they actually wanted a count on buckets instead of %match?

PROC Star

Re: need help creating a summary - Updated information

@Elliott wrote:

Thank you this is wonderful, as for the last column it is supposed to be the % of mismatches.  I need to add that.

Then why is it included as a subheading under the group heading titled "% of match by aging bucket"?

If that is just a layout error, then in the DATA NEED step, make a new dummy variable "mismatch_dummy", with a one for mismatch, and zero otherwise.  In proc tabulate, add the mismatch_dummy as an analysis variable, and ask for its mean with a PERCENT display.

Also, how would the code change if they actually wanted a count on buckets instead of %match?

Since the %match values are nothing more than the mean of a bunch of 1's (match) and 0's for the analysis variable (not class variable) match_dummy (which I called match_pct in my earlier note), then to get a "count" of matches instead, replace the code requesting a mean to a request for a sum.  The sum of all those 1's will be the same as a count of those 1's.  Don't forget to change the associated format request.

This is a common "trick" to use with proc tabulate.  If match_dummy had been declared a class variable - which might have been the intuitive instinct, then proc tabulate would generate two columns (or rows) for it - one for the 0's and one for the 1's.  But you want only one column.  Using match_dummy as an analysis variable, and coding it only as 1's and 0's allows you to replicate a count (which is the normal use of a class variable) with a sum.

Of course, the same was done with the mismatch_dummy variable, except you want a mean instead of a sum.

To make the code clearer, take away all the quoted elements, and maybe the formats - then you'll see the operational code.

``````data work.summary_data;
infile datalines dlm=',' dsd truncover;
length PendingNo \$9 CrUnit \$5 Store \$5 match \$10 bucket \$5 Rulechar \$1 Rule 3 ;
input PendingNo CrUnit Store match bucket Rulechar \$ Rule 1.  ;
datalines;
253864481,4730,4,MATCH,0-5,2,2
212777089,4730,4,NOMATCH,0-5,,
301172842,4730,11,MATCH,0-5,4,4
281739414,4730,14,MATCH,6-15,1,1
261300392,4730,14,MATCH,0-5,4,4
288475892,4730,14,MATCH,0-5,4,4
313076925,4730,14,MATCH,0-5,4,4
23131926,4730,14,MATCH,0-5,4,4
273107812,4730,58,MATCH,0-5,3,3
226270010,4730,70,MATCH,0-5,1,1
276177164,4730,70,MATCH,0-5,1,1
92644582,4730,70,MATCH,0-5,1,1
137788903,4730,70,MATCH,0-5,1,1
63797089,4730,70,MATCH,0-5,1,1
114514092,4730,70,MATCH,6-15,1,1
159810023,4730,70,MATCH,0-5,1,1
239028597,4730,70,MATCH,6-15,1,1
291638142,4730,70,MATCH,0-5,1,1
306576926,4730,70,MATCH,0-5,2,2
184156646,4730,70,MATCH,0-5,3,3
214132768,4730,70,MATCH,0-5,3,3
51683419,4730,70,MATCH,0-5,3,3
137563134,4730,70,MATCH,0-5,3,3
217461698,4730,70,MATCH,0-5,3,3
109113011,4730,70,MATCH,0-5,3,3
149173652,4730,70,MATCH,0-5,3,3
154079135,4730,70,MATCH,0-5,3,3
189377914,4730,70,MATCH,0-5,3,3
204766847,4730,70,MATCH,0-5,3,3
189066497,4730,70,MATCH,0-5,4,4
251583641,4730,70,MATCH,0-5,4,4
306428966,4730,70,MATCH,0-5,4,4
46074769,4730,70,MATCH,0-5,4,4
72116021,4730,70,MATCH,0-5,4,4
112755683,4730,70,MATCH,0-5,4,4
112963673,4730,70,MATCH,0-5,4,4
147893714,4730,70,MATCH,0-5,4,4
162142576,4730,70,MATCH,0-5,4,4
233991350,4730,70,MATCH,0-5,4,4
283368603,4730,70,MATCH,0-5,4,4
38519097,4730,70,MATCH,0-5,4,4
78059359,4730,70,MATCH,0-5,4,4
104160371,4730,130,MATCH,0-5,4,4
119392631,4730,130,MATCH,0-5,4,4
184920636,4730,130,MATCH,0-5,4,4
81757207,4730,130,MATCH,0-5,4,4
96275766,4730,130,MATCH,0-5,4,4
96633756,4730,130,MATCH,0-5,4,4
214350928,4730,139,MATCH,0-5,1,1
13148566,4730,139,MATCH,0-5,3,3
164847505,4730,139,MATCH,0-5,4,4
184356706,4730,139,MATCH,0-5,4,4
67873620,4730,139,MATCH,0-5,4,4
72534031,4730,139,MATCH,0-5,4,4
102649143,4730,139,MATCH,0-5,4,4
182921667,4730,139,MATCH,0-5,4,4
207836158,4730,139,MATCH,0-5,4,4
238375000,4730,139,MATCH,0-5,4,4
278454232,4730,139,MATCH,0-5,4,4
298747613,4730,139,MATCH,0-5,4,4
251074531,4730,139,NOMATCH,0-5,,
293309254,4730,139,NOMATCH,0-5,,
204006727,4730,139,NOMATCH,0-5,,
106813447,4730,139,NOMATCH,0-5,,
43030128,4730,140,MATCH,6-15,1,1
291264232,4730,140,MATCH,0-5,1,1
296210900,4730,140,MATCH,0-5,2,2
99259380,4730,140,MATCH,16-30,3,3
51328329,4730,143,MATCH,16-30,1,1
run;

data need / view=need;
set work.summary_data ;

/*Put a leading space in front of '0-5' and '6-15' so that
lexicographic order replicates the intended age group order*/
if bucket='0-5' then bucket=' 0-5';
else if bucket='6-15' then bucket=' 6-15';

if rule=. then do;
rule=99 ; /*Use for mismatch*/
match_dummy=0;
end;
else match_dummy=1;

mismatch_dummy=1-match_dummy;

/*Right justify STORE variable to get expected sort order*/
store=right(store);
run ;

proc format ;
value rulefmt   99='MisMatch'  other=[1.] ;
run;

proc tabulate data =need;
class rule store bucket ;
var match_dummy mismatch_dummy;
table   store
, rule*n=' '*f=comma6.0
all='Total'*N=''*f=comma6.0
match_dummy='% of Match by Aging Bucket'*bucket=' '*sum=' '*f=comma6.0
mismatch_dummy='MisMatch %'*mean=' '*f=percent5.;
format rule rulefmt. ;
run;``````

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Discussion stats
• 13 replies
• 835 views
• 0 likes
• 4 in conversation