SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Program for Data Reconciliation Process

Reply
Occasional Contributor
Posts: 19

Program for Data Reconciliation Process

Hello Folks,

 

Can some one provide me with the example program of data Reconciliation Process? which gives discrepancies as output dataset, between the records of two data sets.

 

Thanks in Advance.

 

PROC Star
Posts: 1,288

Re: Program for Data Reconciliation Process

Can you provide an example of the input data that you have, and the kind of output that you require?

 

There are a number of excellent ways to do this in SAS.

 

Tom

Occasional Contributor
Posts: 19

Re: Program for Data Reconciliation Process

Hello Tom,

 

We have two data sets here, client and vendor files. We have to compare both and give the discrepancies as shown below

like wise, we have to compare client-vendor and vendor-client (from both sides) and show the discrepancies data set.

 

client data set:

 

subject    visit    date_of_exam    modality
101    baseline    1-Jan-89    CT
101    week07    30-Apr-89    MRI
101    week13    5-May-89    PET
101    week27    12-Jun-89    ECHO
101    week27    12-Jun-89    MUGA

 

vendor dataset:

 

subject    visit    date_of exam    modality
101    baseline    5-Jan-89    CT
101    WEEK07    30-Apr-89    CT
101    week13    10-May-89    DUAL-CT
101    WEEK27    12-Jun-89    ECHO

 

Discrepancy Report: (Output Data set ):

 

SUBJECT VISIT    DATE_OF_EXAM    MODALITY    TYPE_OF_DESCRIPANCY    DESCRIPTION
101    baseline    5-Jan-89                      CT            DATE OF EXAM DISCRIPANT    CLIENT HAS 01-JAN-89
101    week07    30-Apr-89                     CT             MODALITY DISCREPANT    CLIENT HAS MRI
101    week13    10-May-89                  DUAL-CT    MISSING IN CLIENT DATABASE    RECORD MISSING IN CLIENT(IF MORE      THAN                  ONE                                                                    VARIABLE IS DISCREPANT IT SHOULD BE SHOWN AS MISSING)
101    week27    12-Jun-89                    MUGA    MISSING IN VENDOR DATABASE    RECORD MISSING IN VENDOR

 

Please help me with this

 

Thank you

 

New Contributor
Posts: 3

Re: Program for Data Reconciliation Process

A common approach that I've used many times in the past is as follows:

 

1. Get a unique list of all PKs from both data sources 

2. Left join data source 1 and 2 to the PK Lists and write a conditional logic to identify possible missing and changed values for each field 

3. summarize the discrepancy results for each record

 

Let me know if this helps.

 

Occasional Contributor
Posts: 19

Re: Program for Data Reconciliation Process

Posted in reply to DataFlux_SME
hello DataFlux_SME,

Can you please provide me with the example program?
New Contributor
Posts: 3

Re: Program for Data Reconciliation Process

Here is an Oracle SQL code / algorithm that you can use directly if your data is already in a database or convert to SAS to perform such reconciliation. It identifies missing records on either data-set as well as discrepancies in the modality column.

 

select pks.*,
       nvl(c.date_of_exam, v.date_of_exam) as date_of_exam,
       case
         when c.modality is null then
          'Missing in Client Database'
         when v.modaility is null then
          'Missing in Vendor Database'
         when c.modality <> v.modailty then
          'Client has ' || c.modality || ' and vendor has ' || v.modality
         else
          null
       end as type_of_discrepancy
       
       (select distinct subject, visit
          from client
        union
        select distinct subject, visit
          from vendor) pks left join client c on pks.subject = c.subject and pks.visit = c.visit left join vendor v on pks.subject = v.subject and pks.visit = v.visit
PROC Star
Posts: 1,288

Re: Program for Data Reconciliation Process

This might be trickier than it looks:

 

To add one record to your example, what if you have

 

subject visit date_of_exam modality
101 week27 12-Jun-89 ECHO
101 week27 12-Jun-89 MUGA

 

subject visit date_of exam modality
101 WEEK27 12-Jun-89 ECHO
102 week27 12-Jun-89 MUGA

 

Is that an improper subject (102 should be 101), or a missing 101 MUGA in VENDOR and a missing 102 MUGA in CLIENT? And as soon as two variables don't match, it becomes even more challenging.

 

As a first step, I suggest you find the cases where ANYTHING doesn't match, and review them manually. If you don't have too many, you may be able to make logical decisions based on that. If there are a lot, you might be able to use this result to derive some hierarchical rules.

 

Tom

Occasional Contributor
Posts: 19

Re: Program for Data Reconciliation Process

hello Tom,

 

The discrepancies should be shown subject wise by comparing the data from Client and Vendor

Super User
Posts: 3,857

Re: Program for Data Reconciliation Process

[ Edited ]

Do you mean a simple row by row comparison between the two tables? If that is the case then PROC COMPARE can do that easily:

 

proc compare base = client
             compare = vendor
             ;
run;

Although I think your requirements are more like this:

proc compare base = client
             compare = vendor
             ;
  id subject visit;
run;

 

Ask a Question
Discussion stats
  • 8 replies
  • 239 views
  • 0 likes
  • 4 in conversation