I have data that I need to convert to a summary with counts for rule (the number or records in each rule by store) and also a % of match by the bucket value.
I am do not know how to convert to summary like the example below. I have tried proc tabulate and proc transpose but have not been able to get the desired results.
any assistance will be greatly appreciated.
Thanks, Elliott
I need it to look like this with counts for rule level and number mismatch and total
the on the right the percent of bucket aging match/mismatch
Want:
|
Count by matching Rule Level |
|
% of match by aging bucket |
|||||||||||
Store# |
1 |
2 |
3 |
4 |
5 |
6 |
Mis-match |
Total |
0-5 |
6-15 |
16-30 |
31-60 |
>60 |
Mismatch |
00070 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
00144 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
00145 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
00163 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
00175 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
00251 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
00301 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Data I have looks like this:
Store Rule bktdays match bucket
00004 NOMATCH 0-5
00004 2 3 MATCH 0-5
00011 4 -75 MATCH 0-5
00014 4 -10 MATCH 0-5
00014 1 8 MATCH 6-15
00014 4 -42 MATCH 0-5
00014 4 -45 MATCH 0-5
00014 4 -47 MATCH 0-5
00058 3 0 MATCH 0-5
00070 5 -1 MATCH 0-5
00070 3 0 MATCH 0-5
00070 4 -2 MATCH 0-5
00070 3 0 MATCH 0-5
00070 1 0 MATCH 0-5
00070 4 -9 MATCH 0-5
00070 1 5 MATCH 0-5
00070 4 -14 MATCH 0-5
00070 2 2 MATCH 0-5
00070 4 -18 MATCH 0-5
00070 3 0 MATCH 0-5
00070 4 -21 MATCH 0-5
00070 1 0 MATCH 0-5
00070 4 -25 MATCH 0-5
00070 4 -25 MATCH 0-5
00070 3 3 MATCH 0-5
00070 1 1 MATCH 0-5
00070 4 -28 MATCH 0-5
00070 4 -30 MATCH 0-5
00070 3 2 MATCH 0-5
00070 4 0 MATCH 0-5
00070 4 -42 MATCH 0-5
00070 4 -48 MATCH 0-5
00070 1 0 MATCH 0-5
data have;
infile datalines dlm=',' dsd;
length Store $5 Rule 1. bktdays8. match $10 bucket $5;
input store match bucket $ rule 1. bktdays 8.;
datalines;
00004,,,,NOMATCH,0-5
00004,,2,3,MATCH,0-5
00011,,4,-75,MATCH,0-5
00014,,4,-10,MATCH,0-5
00014,,1,8,MATCH,6-15
00014,,4,82,MATCH,61-90
00014,,4,45,MATCH,31-60
00014,4,47,MATCH,31-60
00058,3,0,MATCH,0-5
00070,5,-1,MATCH,0-5
00070,3,0,MATCH,0-5
00070,4,-2,MATCH,0-5
00070,3,0,MATCH,0-5
00070,1,0,MATCH,0-5
00070,4,-9,MATCH,0-5
00070,1,5,MATCH,0-5
00070,,-14,NOMATCH,0-5
00070,2,2,MATCH,0-5
00070,4,-18,MATCH,0-5
00072,3,0,MATCH,0-5
00072,4,21,MATCH,16-30
00073,1,0,MATCH,0-5
00073,4,25,MATCH,16-30
00073,4,25,MATCH,16-30
00074,3,3,MATCH,0-5
00074,1,1,MATCH,0-5
00074,4,58,MATCH,31-60
00075,4,30,MATCH,16-30
00075,3,2,MATCH,0-5
00075,4,0,MATCH,0-5
00076,4,92,MATCH,>90
00076,4,48,MATCH,31-60
00076,1,0,MATCH,0-5
00077,4,52,MATCH,31-60
;
run;
The below does what I think you want, except for the rightmost column.
Assuming that there is a 1:1 equivalence between RULE being missing and a Mismatch status, then just make rule=99 for mis-match cases. To make rule 99's print out as "mismatch" in your table, just assign a user-define format for RULE.
The match percentage by age bucket can be seen as the average of a variable MATCH_PCT which takes on only two values: 1 (for matches) and 0 (mis-matches). Ask tabulate to report the mean of those 1's and 0's, and apply then PERCENT. format to that mean.
What is the rightmost column? It's listed as part of a column group titled "% of Match by aging Bucket", yet that column is not an age group. And it's not even a Match %, since it is headed "mismatch".
data need / view=need;
set work.summary_data ;
/*Put a leading space in front of '0-5' and '6-15' so that
lexicographic order replicates the intended age group order*/
if bucket='0-5' then bucket=' 0-5';
else if bucket='6-15' then bucket=' 6-15';
if rule=. then do;
rule=99 ; /*Use for mismatch*/
match_pct=0;
end;
else match_pct=1;
/*Right justify STORE variable to get expected sort order*/
store=right(store);
run ;
proc format ;
value rulefmt 99='MisMatch' other=[1.] ;
run;
proc tabulate data =need;
class rule store bucket ;
var match_pct;
table store
, rule*n=' '*f=comma6.0
all='Total'*n=' '*f=comma6.0
match_pct='% of match by aging bucket'*bucket=' '*mean=' '*f=percent6.4 ;
format rule rulefmt. ;
run;
Please revise your DATA HAVE step into a working piece of code. Among other issues, your datalines do not all have the same number of csv value (i.e. they don't have the same number of commas). There are other errors too that make it a bit ambiguous as to what advice to give you.
Please fix so that it runs.
And then post it via the "running man" icon.
here is the corrected data lines:
data have;
infile datalines dlm=',' dsd;
length Store $5 Rule $1 bktdays 8. match $10 bucket $5;
input store rule $ bktdays match $ bucket $ ;
datalines;
00004,,0,NOMATCH,0-5
00004,2,3,MATCH,0-5
00011,4,-75,MATCH,0-5
00014,4,-10,MATCH,0-5
00014,1,8,MATCH,6-15
00014,4,82,MATCH,61-90
00014,4,45,MATCH,31-60
00014,4,47,MATCH,31-60
00058,3,0,MATCH,0-5
00070,5,-1,MATCH,0-5
00070,3,0,MATCH,0-5
00070,4,-2,MATCH,0-5
00070,3,0,MATCH,0-5
00070,1,0,MATCH,0-5
00070,4,-9,MATCH,0-5
00070,1,5,MATCH,0-5
00070,,-14,NOMATCH,0-5
00070,2,2,MATCH,0-5
00070,4,-18,MATCH,0-5
00072,3,0,MATCH,0-5
00072,4,21,MATCH,16-30
00073,1,0,MATCH,0-5
00073,4,25,MATCH,16-30
00073,4,25,MATCH,16-30
00074,3,3,MATCH,0-5
00074,1,1,MATCH,0-5
00074,4,58,MATCH,31-60
00075,4,30,MATCH,16-30
00075,3,2,MATCH,0-5
00075,4,0,MATCH,0-5
00076,4,92,MATCH,>90
00076,4,48,MATCH,31-60
00076,1,0,MATCH,0-5
00077,4,52,MATCH,31-60
;
run;
I just tried your code, it does not work. Please test the code and confirm that it works.
What does "mismatch" mean?
You need to provide a bit of information as to exactly what "TOTAL" and "% of match by the bucket value" means. I might guess that Total is a count of records for the store but in case it is something else. But the requested percentage needs a clear definition of what each numerator and denominator used should be especially since you seem to imply that MISMATCH is a category under Bucket when it is a different variable.
I suspect that your data is in for some significant reformatting and/or additional variables to get a report like that.
here are some more details that may help, this is an image of the sample the report requestor provided:
Column H: is the number of mismatch or nomatch records for that store number.
Column J: %mismatch =H18/I18
Column I: Total of (both match and mismatched) is the number of records for 1-6 + #of mismatch
Starting at column K:
% of matched by aging bucket? - 7 matched out of 35 (sum of 1-6=7)/total? 7/35=.2 or 20%
The example shows 7 but they ask for a % so should show 20%
In the original sample I provided the format was a little different. but I think they just moved the mismatch% to the end of the buckets.
Thank you all for your help with this
I have worked on this all day and am making some progress but still need some help.
data work.summary_data;
infile datalines dlm=',' dsd truncover;
length PendingNo $9 CrUnit $5 Store $5 match $10 bucket $5 Rulechar $1 Rule 3 ;
input PendingNo CrUnit Store match bucket Rulechar $ Rule 1. ;
datalines;
253864481,4730,4,MATCH,0-5,2,2
212777089,4730,4,NOMATCH,0-5,,
301172842,4730,11,MATCH,0-5,4,4
281739414,4730,14,MATCH,6-15,1,1
261300392,4730,14,MATCH,0-5,4,4
288475892,4730,14,MATCH,0-5,4,4
313076925,4730,14,MATCH,0-5,4,4
23131926,4730,14,MATCH,0-5,4,4
273107812,4730,58,MATCH,0-5,3,3
226270010,4730,70,MATCH,0-5,1,1
276177164,4730,70,MATCH,0-5,1,1
92644582,4730,70,MATCH,0-5,1,1
137788903,4730,70,MATCH,0-5,1,1
63797089,4730,70,MATCH,0-5,1,1
114514092,4730,70,MATCH,6-15,1,1
159810023,4730,70,MATCH,0-5,1,1
239028597,4730,70,MATCH,6-15,1,1
291638142,4730,70,MATCH,0-5,1,1
306576926,4730,70,MATCH,0-5,2,2
184156646,4730,70,MATCH,0-5,3,3
214132768,4730,70,MATCH,0-5,3,3
51683419,4730,70,MATCH,0-5,3,3
137563134,4730,70,MATCH,0-5,3,3
217461698,4730,70,MATCH,0-5,3,3
109113011,4730,70,MATCH,0-5,3,3
149173652,4730,70,MATCH,0-5,3,3
154079135,4730,70,MATCH,0-5,3,3
189377914,4730,70,MATCH,0-5,3,3
204766847,4730,70,MATCH,0-5,3,3
189066497,4730,70,MATCH,0-5,4,4
251583641,4730,70,MATCH,0-5,4,4
306428966,4730,70,MATCH,0-5,4,4
46074769,4730,70,MATCH,0-5,4,4
72116021,4730,70,MATCH,0-5,4,4
112755683,4730,70,MATCH,0-5,4,4
112963673,4730,70,MATCH,0-5,4,4
147893714,4730,70,MATCH,0-5,4,4
162142576,4730,70,MATCH,0-5,4,4
233991350,4730,70,MATCH,0-5,4,4
283368603,4730,70,MATCH,0-5,4,4
38519097,4730,70,MATCH,0-5,4,4
78059359,4730,70,MATCH,0-5,4,4
104160371,4730,130,MATCH,0-5,4,4
119392631,4730,130,MATCH,0-5,4,4
184920636,4730,130,MATCH,0-5,4,4
81757207,4730,130,MATCH,0-5,4,4
96275766,4730,130,MATCH,0-5,4,4
96633756,4730,130,MATCH,0-5,4,4
214350928,4730,139,MATCH,0-5,1,1
13148566,4730,139,MATCH,0-5,3,3
164847505,4730,139,MATCH,0-5,4,4
184356706,4730,139,MATCH,0-5,4,4
67873620,4730,139,MATCH,0-5,4,4
72534031,4730,139,MATCH,0-5,4,4
102649143,4730,139,MATCH,0-5,4,4
182921667,4730,139,MATCH,0-5,4,4
207836158,4730,139,MATCH,0-5,4,4
238375000,4730,139,MATCH,0-5,4,4
278454232,4730,139,MATCH,0-5,4,4
298747613,4730,139,MATCH,0-5,4,4
251074531,4730,139,NOMATCH,0-5,,
293309254,4730,139,NOMATCH,0-5,,
204006727,4730,139,NOMATCH,0-5,,
106813447,4730,139,NOMATCH,0-5,,
43030128,4730,140,MATCH,6-15,1,1
291264232,4730,140,MATCH,0-5,1,1
296210900,4730,140,MATCH,0-5,2,2
99259380,4730,140,MATCH,16-30,3,3
51328329,4730,143,MATCH,16-30,1,1
;
run;
Proc tabulate data=summary_data;
title "Rule Analysis";
class store /missing;
class PendingNo rulechar bucket /missing;
var rule;
table store, rulechar * bucket
(all='Total Records' rule=''*(n="# of Matches"))
/misstext=' '
;
run;
This produces part of what I need, I am still needing to know how to create the % of mismatches and also I would like Rules 1 - 6 on the left then total then buckets then %mismatch.
This is what the current output looks like now, but I would like to have it look like what is below
here are images of what I have and below is what I am trying to get to
.
The below does what I think you want, except for the rightmost column.
Assuming that there is a 1:1 equivalence between RULE being missing and a Mismatch status, then just make rule=99 for mis-match cases. To make rule 99's print out as "mismatch" in your table, just assign a user-define format for RULE.
The match percentage by age bucket can be seen as the average of a variable MATCH_PCT which takes on only two values: 1 (for matches) and 0 (mis-matches). Ask tabulate to report the mean of those 1's and 0's, and apply then PERCENT. format to that mean.
What is the rightmost column? It's listed as part of a column group titled "% of Match by aging Bucket", yet that column is not an age group. And it's not even a Match %, since it is headed "mismatch".
data need / view=need;
set work.summary_data ;
/*Put a leading space in front of '0-5' and '6-15' so that
lexicographic order replicates the intended age group order*/
if bucket='0-5' then bucket=' 0-5';
else if bucket='6-15' then bucket=' 6-15';
if rule=. then do;
rule=99 ; /*Use for mismatch*/
match_pct=0;
end;
else match_pct=1;
/*Right justify STORE variable to get expected sort order*/
store=right(store);
run ;
proc format ;
value rulefmt 99='MisMatch' other=[1.] ;
run;
proc tabulate data =need;
class rule store bucket ;
var match_pct;
table store
, rule*n=' '*f=comma6.0
all='Total'*n=' '*f=comma6.0
match_pct='% of match by aging bucket'*bucket=' '*mean=' '*f=percent6.4 ;
format rule rulefmt. ;
run;
Thank you this is wonderful, as for the last column it is supposed to be the % of mismatches. I need to add that.
Also, how would the code change if they actually wanted a count on buckets instead of %match?
@Elliott wrote:
Thank you this is wonderful, as for the last column it is supposed to be the % of mismatches. I need to add that.
Then why is it included as a subheading under the group heading titled "% of match by aging bucket"?
If that is just a layout error, then in the DATA NEED step, make a new dummy variable "mismatch_dummy", with a one for mismatch, and zero otherwise. In proc tabulate, add the mismatch_dummy as an analysis variable, and ask for its mean with a PERCENT display.
Also, how would the code change if they actually wanted a count on buckets instead of %match?
Since the %match values are nothing more than the mean of a bunch of 1's (match) and 0's for the analysis variable (not class variable) match_dummy (which I called match_pct in my earlier note), then to get a "count" of matches instead, replace the code requesting a mean to a request for a sum. The sum of all those 1's will be the same as a count of those 1's. Don't forget to change the associated format request.
This is a common "trick" to use with proc tabulate. If match_dummy had been declared a class variable - which might have been the intuitive instinct, then proc tabulate would generate two columns (or rows) for it - one for the 0's and one for the 1's. But you want only one column. Using match_dummy as an analysis variable, and coding it only as 1's and 0's allows you to replicate a count (which is the normal use of a class variable) with a sum.
Of course, the same was done with the mismatch_dummy variable, except you want a mean instead of a sum.
To make the code clearer, take away all the quoted elements, and maybe the formats - then you'll see the operational code.
data work.summary_data;
infile datalines dlm=',' dsd truncover;
length PendingNo $9 CrUnit $5 Store $5 match $10 bucket $5 Rulechar $1 Rule 3 ;
input PendingNo CrUnit Store match bucket Rulechar $ Rule 1. ;
datalines;
253864481,4730,4,MATCH,0-5,2,2
212777089,4730,4,NOMATCH,0-5,,
301172842,4730,11,MATCH,0-5,4,4
281739414,4730,14,MATCH,6-15,1,1
261300392,4730,14,MATCH,0-5,4,4
288475892,4730,14,MATCH,0-5,4,4
313076925,4730,14,MATCH,0-5,4,4
23131926,4730,14,MATCH,0-5,4,4
273107812,4730,58,MATCH,0-5,3,3
226270010,4730,70,MATCH,0-5,1,1
276177164,4730,70,MATCH,0-5,1,1
92644582,4730,70,MATCH,0-5,1,1
137788903,4730,70,MATCH,0-5,1,1
63797089,4730,70,MATCH,0-5,1,1
114514092,4730,70,MATCH,6-15,1,1
159810023,4730,70,MATCH,0-5,1,1
239028597,4730,70,MATCH,6-15,1,1
291638142,4730,70,MATCH,0-5,1,1
306576926,4730,70,MATCH,0-5,2,2
184156646,4730,70,MATCH,0-5,3,3
214132768,4730,70,MATCH,0-5,3,3
51683419,4730,70,MATCH,0-5,3,3
137563134,4730,70,MATCH,0-5,3,3
217461698,4730,70,MATCH,0-5,3,3
109113011,4730,70,MATCH,0-5,3,3
149173652,4730,70,MATCH,0-5,3,3
154079135,4730,70,MATCH,0-5,3,3
189377914,4730,70,MATCH,0-5,3,3
204766847,4730,70,MATCH,0-5,3,3
189066497,4730,70,MATCH,0-5,4,4
251583641,4730,70,MATCH,0-5,4,4
306428966,4730,70,MATCH,0-5,4,4
46074769,4730,70,MATCH,0-5,4,4
72116021,4730,70,MATCH,0-5,4,4
112755683,4730,70,MATCH,0-5,4,4
112963673,4730,70,MATCH,0-5,4,4
147893714,4730,70,MATCH,0-5,4,4
162142576,4730,70,MATCH,0-5,4,4
233991350,4730,70,MATCH,0-5,4,4
283368603,4730,70,MATCH,0-5,4,4
38519097,4730,70,MATCH,0-5,4,4
78059359,4730,70,MATCH,0-5,4,4
104160371,4730,130,MATCH,0-5,4,4
119392631,4730,130,MATCH,0-5,4,4
184920636,4730,130,MATCH,0-5,4,4
81757207,4730,130,MATCH,0-5,4,4
96275766,4730,130,MATCH,0-5,4,4
96633756,4730,130,MATCH,0-5,4,4
214350928,4730,139,MATCH,0-5,1,1
13148566,4730,139,MATCH,0-5,3,3
164847505,4730,139,MATCH,0-5,4,4
184356706,4730,139,MATCH,0-5,4,4
67873620,4730,139,MATCH,0-5,4,4
72534031,4730,139,MATCH,0-5,4,4
102649143,4730,139,MATCH,0-5,4,4
182921667,4730,139,MATCH,0-5,4,4
207836158,4730,139,MATCH,0-5,4,4
238375000,4730,139,MATCH,0-5,4,4
278454232,4730,139,MATCH,0-5,4,4
298747613,4730,139,MATCH,0-5,4,4
251074531,4730,139,NOMATCH,0-5,,
293309254,4730,139,NOMATCH,0-5,,
204006727,4730,139,NOMATCH,0-5,,
106813447,4730,139,NOMATCH,0-5,,
43030128,4730,140,MATCH,6-15,1,1
291264232,4730,140,MATCH,0-5,1,1
296210900,4730,140,MATCH,0-5,2,2
99259380,4730,140,MATCH,16-30,3,3
51328329,4730,143,MATCH,16-30,1,1
run;
data need / view=need;
set work.summary_data ;
/*Put a leading space in front of '0-5' and '6-15' so that
lexicographic order replicates the intended age group order*/
if bucket='0-5' then bucket=' 0-5';
else if bucket='6-15' then bucket=' 6-15';
if rule=. then do;
rule=99 ; /*Use for mismatch*/
match_dummy=0;
end;
else match_dummy=1;
mismatch_dummy=1-match_dummy;
/*Right justify STORE variable to get expected sort order*/
store=right(store);
run ;
proc format ;
value rulefmt 99='MisMatch' other=[1.] ;
run;
proc tabulate data =need;
class rule store bucket ;
var match_dummy mismatch_dummy;
table store
, rule*n=' '*f=comma6.0
all='Total'*N=''*f=comma6.0
match_dummy='% of Match by Aging Bucket'*bucket=' '*sum=' '*f=comma6.0
mismatch_dummy='MisMatch %'*mean=' '*f=percent5.;
format rule rulefmt. ;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.