Hi
I'm trying to find an efficient of doing the following:
I'm carrying out a reconciliation on a data set with 2000 observations and on of the fields in the data set is a comment on the reconciliation and it can have the following comments
Premium matches
Mismatch due to..reason 1
Mismatch due to..reason 2
and so on
I want to create a table or dataset which I can export that has the summary information i.e. something that looks like this:
Rec Summary | Frequency | % of Total |
---|---|---|
Match | 1952 | 97.5% |
Mismatch - Reason 1 | 22 | 1.1% |
Mismatch - Reason 2 | 28 | 1.4% |
Total | 2000 | 100% |
Start Date | 20130101 | |
End Date | 20131231 |
The start and end dates are taken as the lowest and highest values of one of the fields in the dataset with the transaction dates for each observation.
Yep, I assume dnuermic for the min and forgot the put,:
set REC_SUMMARY="Start Date", /* Insert the min start date */
FREQ=(select put(min(POLDTR),yymmdd10.) from OUT_BOTH)
With that second requirement it is time to post a description of your data.
The comment "some variable" for the max min dates can be interpretted several ways; a specific variable in which case name it, or multiple variables that you want to select from.
Thanks ballardw. What I mean is that on of the fields in my data set is the transaction date for each transaction in my dataset. The fields looks like this:
POLDTR
20130302
20130211
20130807
20130202
20130117
...
...
The start date is the min of all these dates and the end date is the max.
Also, is there a way of including a 'total' row for the table as shown above?
Well, I have just finished for the day, but thought I would throw out a couple of ideas (note this is is in no way complete, just gives a flavor. Would need to see some test data and what reason 1/2 is and what constitutes a match):
data out_both out_a out_b;
merge data1 (in=a) data2 (in=b);
by by_vars;
if a and b then out_both;
else if a then out_a;
else out_b;
run;
proc sql;
create table RESULT
(
REC_SUMMARY char(200),
FREQ char(200),
PCENT char(200)
);
insert into RESULT
set REC_SUMMARY="Match",
FREQ=(select put(count(*),best.) from OUT_BOTH),
PCENT=(select put( (count(*) / &TOTAL.) *100,5.2)
from OUT_BOTH)
set REC_SUMMARY="Mismatch reason 1",
FREQ=(select put(count(*),best.) from OUT_A)...
set REC_SUMMARY="Start Date",
FREQ=(select min(DATE) from OUT_BOTH group by ..)...
quit;
Hi RW9
Here is my dataset, just two colunms. I don't fully follow your code, I'm getting errors.
Ok, its even easier then:
proc sql;
create table RESULT /* Create an empty results table which we will populate by the statements further below */
(
REC_SUMMARY char(200),
FREQ char(200),
PCENT char(200)
);
insert into RESULT
set REC_SUMMARY="Match", /* Here we start inserting the first row of the results which is those that match */
FREQ=(select put(count(*),best.) from HAVE where COMMENT="MATCH"), /* Count all records with MATCH */
PCENT=put( (select count(*) from HAVE where COMMENT="MATCH") / (select count(*) from HAVE) *100,5.2) /* Num rows with Match / all rows * 100 = percent */
set REC_SUMMARY="Mismatch reason 1", /* Do same as above but using condition doe reason 1 - which I can't identify from what you have given */
FREQ=(select put(count(*),best.) from OUT_A)...
set REC_SUMMARY="Start Date", /* Insert the min start date */
FREQ=(select min(POLDTR) from OUT_BOTH)
set REC_SUMMARY="End Date", /* Insert the max start date */
FREQ=(select max(POLDTR) from OUT_BOTH);
quit;
Hi RW9
Thanks for your help. The code is working except for the piece that creates teh Start Date and End Date. I get an error message that reads "It is invalid to assign a numeric expression to a character value using the SET clause".
I have attached the dataset above.
Rregards
Matthew
Yep, I assume dnuermic for the min and forgot the put,:
set REC_SUMMARY="Start Date", /* Insert the min start date */
FREQ=(select put(min(POLDTR),yymmdd10.) from OUT_BOTH)
You need to post an example of what your full data looks like.
Try this.
proc tabulate data=have out=want;
class comments;
table comments='Rec Summary' all='Total',(n='Frequency' pctn='% of Total');
run;
proc print data=want;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.