BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Elliott
Obsidian | Level 7

I have data that I need to convert to a summary with counts for rule (the number or records in each rule by store) and also a % of match by the bucket value.

I am do not know how to convert to summary like the example below.  I have tried proc tabulate and proc transpose but have not been able to get the desired results.

any assistance will be greatly appreciated.

 

Thanks, Elliott

 

I need it to look like this with counts for rule level and number mismatch and total 

the on the right the percent of bucket aging match/mismatch

Want:

Count by matching Rule Level

% of match by aging bucket

Store#

1

2

3

4

5

6

Mis-match

Total

0-5

6-15

16-30

31-60

>60

Mismatch

00070

00144

00145

00163

00175

00251

00301

 

Data I have looks like this:

Store Rule bktdays match bucket
00004 NOMATCH 0-5
00004 2 3 MATCH 0-5
00011 4 -75 MATCH 0-5
00014 4 -10 MATCH 0-5
00014 1 8 MATCH 6-15
00014 4 -42 MATCH 0-5
00014 4 -45 MATCH 0-5
00014 4 -47 MATCH 0-5
00058 3 0 MATCH 0-5
00070 5 -1 MATCH 0-5
00070 3 0 MATCH 0-5
00070 4 -2 MATCH 0-5
00070 3 0 MATCH 0-5
00070 1 0 MATCH 0-5
00070 4 -9 MATCH 0-5
00070 1 5 MATCH 0-5
00070 4 -14 MATCH 0-5
00070 2 2 MATCH 0-5
00070 4 -18 MATCH 0-5
00070 3 0 MATCH 0-5
00070 4 -21 MATCH 0-5
00070 1 0 MATCH 0-5
00070 4 -25 MATCH 0-5
00070 4 -25 MATCH 0-5
00070 3 3 MATCH 0-5
00070 1 1 MATCH 0-5
00070 4 -28 MATCH 0-5
00070 4 -30 MATCH 0-5
00070 3 2 MATCH 0-5
00070 4 0 MATCH 0-5
00070 4 -42 MATCH 0-5
00070 4 -48 MATCH 0-5
00070 1 0 MATCH 0-5

 

data have;
infile datalines dlm=',' dsd;
length Store $5 Rule 1. bktdays8. match $10 bucket $5;
input store match bucket $ rule 1. bktdays 8.;
datalines;
00004,,,,NOMATCH,0-5
00004,,2,3,MATCH,0-5
00011,,4,-75,MATCH,0-5
00014,,4,-10,MATCH,0-5
00014,,1,8,MATCH,6-15
00014,,4,82,MATCH,61-90
00014,,4,45,MATCH,31-60
00014,4,47,MATCH,31-60
00058,3,0,MATCH,0-5
00070,5,-1,MATCH,0-5
00070,3,0,MATCH,0-5
00070,4,-2,MATCH,0-5
00070,3,0,MATCH,0-5
00070,1,0,MATCH,0-5
00070,4,-9,MATCH,0-5
00070,1,5,MATCH,0-5
00070,,-14,NOMATCH,0-5
00070,2,2,MATCH,0-5
00070,4,-18,MATCH,0-5
00072,3,0,MATCH,0-5
00072,4,21,MATCH,16-30
00073,1,0,MATCH,0-5
00073,4,25,MATCH,16-30
00073,4,25,MATCH,16-30
00074,3,3,MATCH,0-5
00074,1,1,MATCH,0-5
00074,4,58,MATCH,31-60
00075,4,30,MATCH,16-30
00075,3,2,MATCH,0-5
00075,4,0,MATCH,0-5
00076,4,92,MATCH,>90
00076,4,48,MATCH,31-60
00076,1,0,MATCH,0-5
00077,4,52,MATCH,31-60
;
run;

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

The below does what I think you want, except for the rightmost  column.

 

Assuming that there is a 1:1 equivalence between RULE being missing and a Mismatch status, then just make rule=99 for mis-match cases.  To make rule 99's print out as "mismatch" in your table, just assign a user-define format for RULE.

 

The match percentage by age bucket can be seen as the average of a variable  MATCH_PCT which takes on only two values: 1 (for matches) and 0 (mis-matches).  Ask tabulate to report the mean of those 1's and 0's, and apply then PERCENT. format to that mean.

 

What is the rightmost column?  It's listed as part of a column group titled "% of Match by aging Bucket", yet that column is not an age group.  And it's not even a Match %, since it is headed "mismatch".

 

 

 

data need / view=need;
  set work.summary_data ;

  /*Put a leading space in front of '0-5' and '6-15' so that
    lexicographic order replicates the intended age group order*/
  if bucket='0-5' then bucket=' 0-5';
  else if bucket='6-15' then bucket=' 6-15';

  if rule=. then do;
    rule=99 ; /*Use for mismatch*/
    match_pct=0;
  end;
  else match_pct=1;

  /*Right justify STORE variable to get expected sort order*/
  store=right(store); 
run ;


proc format ;
  value rulefmt   99='MisMatch'  other=[1.] ;
run;

proc tabulate data =need;
  class rule store bucket ;
  var match_pct;
  table   store
        , rule*n=' '*f=comma6.0
          all='Total'*n=' '*f=comma6.0
          match_pct='% of match by aging bucket'*bucket=' '*mean=' '*f=percent6.4 ;
  format rule rulefmt. ;
run;

 

 

 
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

13 REPLIES 13
mkeintz
PROC Star

Please revise your DATA HAVE step into a working piece of code.  Among other issues, your datalines do not all have the same number of csv value (i.e. they don't have the same number of commas).  There are other errors too that make it a bit ambiguous as to what advice to give you.

 

Please fix so that it runs.

 

And then post it via the "running man" icon.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Elliott
Obsidian | Level 7

here is the corrected data lines:

data have;

infile datalines dlm=',' dsd;

length Store $5 Rule $1 bktdays 8. match $10 bucket $5;

input store rule $ bktdays match $ bucket $ ;

datalines;

00004,,0,NOMATCH,0-5

00004,2,3,MATCH,0-5

00011,4,-75,MATCH,0-5

00014,4,-10,MATCH,0-5

00014,1,8,MATCH,6-15

00014,4,82,MATCH,61-90

00014,4,45,MATCH,31-60

00014,4,47,MATCH,31-60

00058,3,0,MATCH,0-5

00070,5,-1,MATCH,0-5

00070,3,0,MATCH,0-5

00070,4,-2,MATCH,0-5

00070,3,0,MATCH,0-5

00070,1,0,MATCH,0-5

00070,4,-9,MATCH,0-5

00070,1,5,MATCH,0-5

00070,,-14,NOMATCH,0-5

00070,2,2,MATCH,0-5

00070,4,-18,MATCH,0-5

00072,3,0,MATCH,0-5

00072,4,21,MATCH,16-30

00073,1,0,MATCH,0-5

00073,4,25,MATCH,16-30

00073,4,25,MATCH,16-30

00074,3,3,MATCH,0-5

00074,1,1,MATCH,0-5

00074,4,58,MATCH,31-60

00075,4,30,MATCH,16-30

00075,3,2,MATCH,0-5

00075,4,0,MATCH,0-5

00076,4,92,MATCH,>90

00076,4,48,MATCH,31-60

00076,1,0,MATCH,0-5

00077,4,52,MATCH,31-60

;

run;
PaigeMiller
Diamond | Level 26

I just tried your code, it does not work. Please test the code and confirm that it works.

--
Paige Miller
Elliott
Obsidian | Level 7
update to code has already been posted, sorry for the inconvenience
mkeintz
PROC Star

What does "mismatch" mean?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Elliott
Obsidian | Level 7
the initial data before it was reduced to this data was to match record from 2 different data sets, the match/nomatch indicates if a match was found.
ballardw
Super User

You need to provide a bit of information as to exactly what "TOTAL" and "% of match by the bucket value" means. I might guess that Total is a count of records for the store but in case it is something else. But the requested percentage needs a clear definition of what each numerator and denominator used should be especially since you seem to imply that MISMATCH is a category under Bucket when it is a different variable.

 

I suspect that your data is in for some significant reformatting and/or additional variables to get a report like that.

Elliott
Obsidian | Level 7

here are some more details that may help, this is an image of the sample the report requestor provided:

 

Column H: is the number of mismatch or nomatch records for that store number.

Column J: %mismatch =H18/I18

Column I:  Total of (both match and mismatched) is the number of records for 1-6 + #of mismatch

Starting at column K:

% of matched by aging bucket?  - 7 matched out of 35 (sum of 1-6=7)/total? 7/35=.2 or 20%

 

image004.jpg

 

In the original sample I provided the format was a little different.  but I think they just moved the mismatch% to the end of the buckets.

 

Thank you all for your help with this

Elliott
Obsidian | Level 7

I have worked on this all day and am making some progress but still need some help.  

data work.summary_data;
  infile datalines dlm=',' dsd truncover;
  length PendingNo $9 CrUnit $5 Store $5 match $10 bucket $5 Rulechar $1 Rule 3 ;
  input PendingNo CrUnit Store match bucket Rulechar $ Rule 1.  ;
  datalines;
253864481,4730,4,MATCH,0-5,2,2
212777089,4730,4,NOMATCH,0-5,,
301172842,4730,11,MATCH,0-5,4,4
281739414,4730,14,MATCH,6-15,1,1
261300392,4730,14,MATCH,0-5,4,4
288475892,4730,14,MATCH,0-5,4,4
313076925,4730,14,MATCH,0-5,4,4
23131926,4730,14,MATCH,0-5,4,4
273107812,4730,58,MATCH,0-5,3,3
226270010,4730,70,MATCH,0-5,1,1
276177164,4730,70,MATCH,0-5,1,1
92644582,4730,70,MATCH,0-5,1,1
137788903,4730,70,MATCH,0-5,1,1
63797089,4730,70,MATCH,0-5,1,1
114514092,4730,70,MATCH,6-15,1,1
159810023,4730,70,MATCH,0-5,1,1
239028597,4730,70,MATCH,6-15,1,1
291638142,4730,70,MATCH,0-5,1,1
306576926,4730,70,MATCH,0-5,2,2
184156646,4730,70,MATCH,0-5,3,3
214132768,4730,70,MATCH,0-5,3,3
51683419,4730,70,MATCH,0-5,3,3
137563134,4730,70,MATCH,0-5,3,3
217461698,4730,70,MATCH,0-5,3,3
109113011,4730,70,MATCH,0-5,3,3
149173652,4730,70,MATCH,0-5,3,3
154079135,4730,70,MATCH,0-5,3,3
189377914,4730,70,MATCH,0-5,3,3
204766847,4730,70,MATCH,0-5,3,3
189066497,4730,70,MATCH,0-5,4,4
251583641,4730,70,MATCH,0-5,4,4
306428966,4730,70,MATCH,0-5,4,4
46074769,4730,70,MATCH,0-5,4,4
72116021,4730,70,MATCH,0-5,4,4
112755683,4730,70,MATCH,0-5,4,4
112963673,4730,70,MATCH,0-5,4,4
147893714,4730,70,MATCH,0-5,4,4
162142576,4730,70,MATCH,0-5,4,4
233991350,4730,70,MATCH,0-5,4,4
283368603,4730,70,MATCH,0-5,4,4
38519097,4730,70,MATCH,0-5,4,4
78059359,4730,70,MATCH,0-5,4,4
104160371,4730,130,MATCH,0-5,4,4
119392631,4730,130,MATCH,0-5,4,4
184920636,4730,130,MATCH,0-5,4,4
81757207,4730,130,MATCH,0-5,4,4
96275766,4730,130,MATCH,0-5,4,4
96633756,4730,130,MATCH,0-5,4,4
214350928,4730,139,MATCH,0-5,1,1
13148566,4730,139,MATCH,0-5,3,3
164847505,4730,139,MATCH,0-5,4,4
184356706,4730,139,MATCH,0-5,4,4
67873620,4730,139,MATCH,0-5,4,4
72534031,4730,139,MATCH,0-5,4,4
102649143,4730,139,MATCH,0-5,4,4
182921667,4730,139,MATCH,0-5,4,4
207836158,4730,139,MATCH,0-5,4,4
238375000,4730,139,MATCH,0-5,4,4
278454232,4730,139,MATCH,0-5,4,4
298747613,4730,139,MATCH,0-5,4,4
251074531,4730,139,NOMATCH,0-5,,
293309254,4730,139,NOMATCH,0-5,,
204006727,4730,139,NOMATCH,0-5,,
106813447,4730,139,NOMATCH,0-5,,
43030128,4730,140,MATCH,6-15,1,1
291264232,4730,140,MATCH,0-5,1,1
296210900,4730,140,MATCH,0-5,2,2
99259380,4730,140,MATCH,16-30,3,3
51328329,4730,143,MATCH,16-30,1,1
;
run;

Proc tabulate data=summary_data;
title  "Rule Analysis";
   class store  /missing;
   class PendingNo rulechar bucket /missing;
   var rule;
   table store, rulechar * bucket
       (all='Total Records' rule=''*(n="# of Matches"))
       /misstext=' '
   ;
run;

This produces part of what I need, I am still needing to know how to create the % of mismatches and also I would like Rules 1 - 6 on the left then total then buckets then %mismatch.

This is what the current output looks like now, but I would like to have it look like what is below
This is what the query creates, but.This is what the query creates, but.
This is what I am trying to get toThis is what I am trying to get to
Elliott
Obsidian | Level 7

here are images of what I have and below is what I am trying to get to

.current query producescurrent query produceswhere I am trying to getwhere I am trying to get

mkeintz
PROC Star

The below does what I think you want, except for the rightmost  column.

 

Assuming that there is a 1:1 equivalence between RULE being missing and a Mismatch status, then just make rule=99 for mis-match cases.  To make rule 99's print out as "mismatch" in your table, just assign a user-define format for RULE.

 

The match percentage by age bucket can be seen as the average of a variable  MATCH_PCT which takes on only two values: 1 (for matches) and 0 (mis-matches).  Ask tabulate to report the mean of those 1's and 0's, and apply then PERCENT. format to that mean.

 

What is the rightmost column?  It's listed as part of a column group titled "% of Match by aging Bucket", yet that column is not an age group.  And it's not even a Match %, since it is headed "mismatch".

 

 

 

data need / view=need;
  set work.summary_data ;

  /*Put a leading space in front of '0-5' and '6-15' so that
    lexicographic order replicates the intended age group order*/
  if bucket='0-5' then bucket=' 0-5';
  else if bucket='6-15' then bucket=' 6-15';

  if rule=. then do;
    rule=99 ; /*Use for mismatch*/
    match_pct=0;
  end;
  else match_pct=1;

  /*Right justify STORE variable to get expected sort order*/
  store=right(store); 
run ;


proc format ;
  value rulefmt   99='MisMatch'  other=[1.] ;
run;

proc tabulate data =need;
  class rule store bucket ;
  var match_pct;
  table   store
        , rule*n=' '*f=comma6.0
          all='Total'*n=' '*f=comma6.0
          match_pct='% of match by aging bucket'*bucket=' '*mean=' '*f=percent6.4 ;
  format rule rulefmt. ;
run;

 

 

 
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Elliott
Obsidian | Level 7

Thank you this is wonderful, as for the last column it is supposed to be the % of mismatches.  I need to add that.

 

Also, how would the code change if they actually wanted a count on buckets instead of %match?  

mkeintz
PROC Star

@Elliott wrote:

Thank you this is wonderful, as for the last column it is supposed to be the % of mismatches.  I need to add that.


Then why is it included as a subheading under the group heading titled "% of match by aging bucket"?

 

If that is just a layout error, then in the DATA NEED step, make a new dummy variable "mismatch_dummy", with a one for mismatch, and zero otherwise.  In proc tabulate, add the mismatch_dummy as an analysis variable, and ask for its mean with a PERCENT display.

 

 

Also, how would the code change if they actually wanted a count on buckets instead of %match?  


Since the %match values are nothing more than the mean of a bunch of 1's (match) and 0's for the analysis variable (not class variable) match_dummy (which I called match_pct in my earlier note), then to get a "count" of matches instead, replace the code requesting a mean to a request for a sum.  The sum of all those 1's will be the same as a count of those 1's.  Don't forget to change the associated format request.

 

This is a common "trick" to use with proc tabulate.  If match_dummy had been declared a class variable - which might have been the intuitive instinct, then proc tabulate would generate two columns (or rows) for it - one for the 0's and one for the 1's.  But you want only one column.  Using match_dummy as an analysis variable, and coding it only as 1's and 0's allows you to replicate a count (which is the normal use of a class variable) with a sum.

 

Of course, the same was done with the mismatch_dummy variable, except you want a mean instead of a sum.

 

To make the code clearer, take away all the quoted elements, and maybe the formats - then you'll see the operational code.

 

data work.summary_data;
  infile datalines dlm=',' dsd truncover;
  length PendingNo $9 CrUnit $5 Store $5 match $10 bucket $5 Rulechar $1 Rule 3 ;
  input PendingNo CrUnit Store match bucket Rulechar $ Rule 1.  ;
datalines;
253864481,4730,4,MATCH,0-5,2,2
212777089,4730,4,NOMATCH,0-5,,
301172842,4730,11,MATCH,0-5,4,4
281739414,4730,14,MATCH,6-15,1,1
261300392,4730,14,MATCH,0-5,4,4
288475892,4730,14,MATCH,0-5,4,4
313076925,4730,14,MATCH,0-5,4,4
23131926,4730,14,MATCH,0-5,4,4
273107812,4730,58,MATCH,0-5,3,3
226270010,4730,70,MATCH,0-5,1,1
276177164,4730,70,MATCH,0-5,1,1
92644582,4730,70,MATCH,0-5,1,1
137788903,4730,70,MATCH,0-5,1,1
63797089,4730,70,MATCH,0-5,1,1
114514092,4730,70,MATCH,6-15,1,1
159810023,4730,70,MATCH,0-5,1,1
239028597,4730,70,MATCH,6-15,1,1
291638142,4730,70,MATCH,0-5,1,1
306576926,4730,70,MATCH,0-5,2,2
184156646,4730,70,MATCH,0-5,3,3
214132768,4730,70,MATCH,0-5,3,3
51683419,4730,70,MATCH,0-5,3,3
137563134,4730,70,MATCH,0-5,3,3
217461698,4730,70,MATCH,0-5,3,3
109113011,4730,70,MATCH,0-5,3,3
149173652,4730,70,MATCH,0-5,3,3
154079135,4730,70,MATCH,0-5,3,3
189377914,4730,70,MATCH,0-5,3,3
204766847,4730,70,MATCH,0-5,3,3
189066497,4730,70,MATCH,0-5,4,4
251583641,4730,70,MATCH,0-5,4,4
306428966,4730,70,MATCH,0-5,4,4
46074769,4730,70,MATCH,0-5,4,4
72116021,4730,70,MATCH,0-5,4,4
112755683,4730,70,MATCH,0-5,4,4
112963673,4730,70,MATCH,0-5,4,4
147893714,4730,70,MATCH,0-5,4,4
162142576,4730,70,MATCH,0-5,4,4
233991350,4730,70,MATCH,0-5,4,4
283368603,4730,70,MATCH,0-5,4,4
38519097,4730,70,MATCH,0-5,4,4
78059359,4730,70,MATCH,0-5,4,4
104160371,4730,130,MATCH,0-5,4,4
119392631,4730,130,MATCH,0-5,4,4
184920636,4730,130,MATCH,0-5,4,4
81757207,4730,130,MATCH,0-5,4,4
96275766,4730,130,MATCH,0-5,4,4
96633756,4730,130,MATCH,0-5,4,4
214350928,4730,139,MATCH,0-5,1,1
13148566,4730,139,MATCH,0-5,3,3
164847505,4730,139,MATCH,0-5,4,4
184356706,4730,139,MATCH,0-5,4,4
67873620,4730,139,MATCH,0-5,4,4
72534031,4730,139,MATCH,0-5,4,4
102649143,4730,139,MATCH,0-5,4,4
182921667,4730,139,MATCH,0-5,4,4
207836158,4730,139,MATCH,0-5,4,4
238375000,4730,139,MATCH,0-5,4,4
278454232,4730,139,MATCH,0-5,4,4
298747613,4730,139,MATCH,0-5,4,4
251074531,4730,139,NOMATCH,0-5,,
293309254,4730,139,NOMATCH,0-5,,
204006727,4730,139,NOMATCH,0-5,,
106813447,4730,139,NOMATCH,0-5,,
43030128,4730,140,MATCH,6-15,1,1
291264232,4730,140,MATCH,0-5,1,1
296210900,4730,140,MATCH,0-5,2,2
99259380,4730,140,MATCH,16-30,3,3
51328329,4730,143,MATCH,16-30,1,1
run;


data need / view=need;
  set work.summary_data ;

  /*Put a leading space in front of '0-5' and '6-15' so that
    lexicographic order replicates the intended age group order*/
  if bucket='0-5' then bucket=' 0-5';
  else if bucket='6-15' then bucket=' 6-15';

  if rule=. then do;
    rule=99 ; /*Use for mismatch*/
    match_dummy=0;
  end;
  else match_dummy=1;

  mismatch_dummy=1-match_dummy;

  /*Right justify STORE variable to get expected sort order*/
  store=right(store); 
run ;


proc format ;
  value rulefmt   99='MisMatch'  other=[1.] ;
run;

proc tabulate data =need;
  class rule store bucket ;
  var match_dummy mismatch_dummy;
  table   store
        , rule*n=' '*f=comma6.0
          all='Total'*N=''*f=comma6.0
          match_dummy='% of Match by Aging Bucket'*bucket=' '*sum=' '*f=comma6.0
          mismatch_dummy='MisMatch %'*mean=' '*f=percent5.;
  format rule rulefmt. ;
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 958 views
  • 0 likes
  • 4 in conversation