BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
brophymj
Quartz | Level 8

Hi

I'm trying to find an efficient of doing the following:

I'm carrying out a reconciliation on a data set with 2000 observations and on of the fields in the data set is a comment on the reconciliation and it can have the following comments

Premium matches

Mismatch due to..reason 1

Mismatch due to..reason 2

and so on

I want to create a table or dataset which I can export that has the summary information i.e. something that looks like this:

Rec SummaryFrequency% of Total
Match195297.5%
Mismatch - Reason 1221.1%
Mismatch - Reason 2281.4%
Total2000100%
Start Date20130101
End Date20131231

The start and end dates are taken as the lowest and highest values of one of the fields in the dataset with the transaction dates for each observation.

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yep, I assume dnuermic for the min and forgot the put,:

     set     REC_SUMMARY="Start Date",     /* Insert the min start date */

               FREQ=(select put(min(POLDTR),yymmdd10.) from OUT_BOTH)

View solution in original post

9 REPLIES 9
ballardw
Super User

With that second requirement it is time to post a description of your data.

The comment "some variable" for the max min dates can be interpretted several ways; a specific variable in which case name it, or multiple variables that you want to select from.

brophymj
Quartz | Level 8

Thanks ballardw. What I mean is that on of the fields in my data set is the transaction date for each transaction in my dataset. The fields looks like this:

POLDTR

20130302

20130211

20130807

20130202

20130117

...

...

The start date is the min of all these dates and the end date is the max.

Also, is there a way of including a 'total' row for the table as shown above?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, I have just finished for the day, but thought I would throw out a couple of ideas (note this is is in no way complete, just gives a flavor.  Would need to see some test data and what reason 1/2 is and what constitutes a match):

data out_both out_a out_b;

     merge data1 (in=a) data2 (in=b);

     by by_vars;

     if a and b then out_both;

     else if a then out_a;

     else out_b;

run;

proc sql;

     create table RESULT

     (

          REC_SUMMARY char(200),

          FREQ char(200),

          PCENT char(200)

     );

     insert into RESULT

     set     REC_SUMMARY="Match",

               FREQ=(select put(count(*),best.) from OUT_BOTH),

               PCENT=(select put( (count(*) / &TOTAL.) *100,5.2)

                              from OUT_BOTH)

     set     REC_SUMMARY="Mismatch reason 1",

               FREQ=(select put(count(*),best.) from OUT_A)...

     set     REC_SUMMARY="Start Date",

               FREQ=(select min(DATE) from OUT_BOTH group by ..)...

quit;

brophymj
Quartz | Level 8


Hi RW9

Here is my dataset, just two colunms. I don't fully follow your code, I'm getting errors.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ok, its even easier then:

proc sql;

     create table RESULT     /* Create an empty results table which we will populate by the statements further below */

     (

          REC_SUMMARY char(200),

          FREQ char(200),

          PCENT char(200)

     );

     insert into RESULT

     set     REC_SUMMARY="Match",          /* Here we start inserting the first row of the results which is those that match */

               FREQ=(select put(count(*),best.) from HAVE where COMMENT="MATCH"),   /* Count all records with MATCH */

               PCENT=put( (select count(*) from HAVE where COMMENT="MATCH") / (select count(*) from HAVE) *100,5.2)   /* Num rows with Match / all rows * 100 = percent */

     set     REC_SUMMARY="Mismatch reason 1",               /* Do same as above but using condition doe reason 1 - which I can't identify from what you have given */

               FREQ=(select put(count(*),best.) from OUT_A)...

     set     REC_SUMMARY="Start Date",     /* Insert the min start date */

               FREQ=(select min(POLDTR) from OUT_BOTH)

     set     REC_SUMMARY="End Date",     /* Insert the max start date */

               FREQ=(select max(POLDTR) from OUT_BOTH);

quit;

brophymj
Quartz | Level 8

Hi RW9

Thanks for your help. The code is working except for the piece that creates teh Start Date and End Date. I get an error message that reads "It is invalid to assign a numeric expression to a character value using the SET clause".

I have attached the dataset above.

Rregards

Matthew

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yep, I assume dnuermic for the min and forgot the put,:

     set     REC_SUMMARY="Start Date",     /* Insert the min start date */

               FREQ=(select put(min(POLDTR),yymmdd10.) from OUT_BOTH)

Reeza
Super User

You need to post an example of what your full data looks like.

stat_sas
Ammonite | Level 13

Try this.

proc tabulate data=have out=want;
class comments;
table comments='Rec Summary' all='Total',(n='Frequency' pctn='% of Total');
run;

proc print data=want;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1000 views
  • 0 likes
  • 5 in conversation