Hi
I'm trying to find an efficient of doing the following:
I'm carrying out a reconciliation on a data set with 2000 observations and on of the fields in the data set is a comment on the reconciliation and it can have the following comments
Premium matches
Mismatch due to..reason 1
Mismatch due to..reason 2
and so on
I want to create a table or dataset which I can export that has the summary information i.e. something that looks like this:
Rec Summary | Frequency | % of Total |
---|---|---|
Match | 1952 | 97.5% |
Mismatch - Reason 1 | 22 | 1.1% |
Mismatch - Reason 2 | 28 | 1.4% |
Total | 2000 | 100% |
Start Date | 20130101 | |
End Date | 20131231 |
The start and end dates are taken as the lowest and highest values of one of the fields in the dataset with the transaction dates for each observation.
Yep, I assume dnuermic for the min and forgot the put,:
set REC_SUMMARY="Start Date", /* Insert the min start date */
FREQ=(select put(min(POLDTR),yymmdd10.) from OUT_BOTH)
With that second requirement it is time to post a description of your data.
The comment "some variable" for the max min dates can be interpretted several ways; a specific variable in which case name it, or multiple variables that you want to select from.
Thanks ballardw. What I mean is that on of the fields in my data set is the transaction date for each transaction in my dataset. The fields looks like this:
POLDTR
20130302
20130211
20130807
20130202
20130117
...
...
The start date is the min of all these dates and the end date is the max.
Also, is there a way of including a 'total' row for the table as shown above?
Well, I have just finished for the day, but thought I would throw out a couple of ideas (note this is is in no way complete, just gives a flavor. Would need to see some test data and what reason 1/2 is and what constitutes a match):
data out_both out_a out_b;
merge data1 (in=a) data2 (in=b);
by by_vars;
if a and b then out_both;
else if a then out_a;
else out_b;
run;
proc sql;
create table RESULT
(
REC_SUMMARY char(200),
FREQ char(200),
PCENT char(200)
);
insert into RESULT
set REC_SUMMARY="Match",
FREQ=(select put(count(*),best.) from OUT_BOTH),
PCENT=(select put( (count(*) / &TOTAL.) *100,5.2)
from OUT_BOTH)
set REC_SUMMARY="Mismatch reason 1",
FREQ=(select put(count(*),best.) from OUT_A)...
set REC_SUMMARY="Start Date",
FREQ=(select min(DATE) from OUT_BOTH group by ..)...
quit;
Hi RW9
Here is my dataset, just two colunms. I don't fully follow your code, I'm getting errors.
Ok, its even easier then:
proc sql;
create table RESULT /* Create an empty results table which we will populate by the statements further below */
(
REC_SUMMARY char(200),
FREQ char(200),
PCENT char(200)
);
insert into RESULT
set REC_SUMMARY="Match", /* Here we start inserting the first row of the results which is those that match */
FREQ=(select put(count(*),best.) from HAVE where COMMENT="MATCH"), /* Count all records with MATCH */
PCENT=put( (select count(*) from HAVE where COMMENT="MATCH") / (select count(*) from HAVE) *100,5.2) /* Num rows with Match / all rows * 100 = percent */
set REC_SUMMARY="Mismatch reason 1", /* Do same as above but using condition doe reason 1 - which I can't identify from what you have given */
FREQ=(select put(count(*),best.) from OUT_A)...
set REC_SUMMARY="Start Date", /* Insert the min start date */
FREQ=(select min(POLDTR) from OUT_BOTH)
set REC_SUMMARY="End Date", /* Insert the max start date */
FREQ=(select max(POLDTR) from OUT_BOTH);
quit;
Hi RW9
Thanks for your help. The code is working except for the piece that creates teh Start Date and End Date. I get an error message that reads "It is invalid to assign a numeric expression to a character value using the SET clause".
I have attached the dataset above.
Rregards
Matthew
Yep, I assume dnuermic for the min and forgot the put,:
set REC_SUMMARY="Start Date", /* Insert the min start date */
FREQ=(select put(min(POLDTR),yymmdd10.) from OUT_BOTH)
You need to post an example of what your full data looks like.
Try this.
proc tabulate data=have out=want;
class comments;
table comments='Rec Summary' all='Total',(n='Frequency' pctn='% of Total');
run;
proc print data=want;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.