BookmarkSubscribeRSS Feed
pavan1
Obsidian | Level 7

Hello Folks,

 

Can some one provide me with the example program of data Reconciliation Process? which gives discrepancies as output dataset, between the records of two data sets.

 

Thanks in Advance.

 

10 REPLIES 10
TomKari
Onyx | Level 15

Can you provide an example of the input data that you have, and the kind of output that you require?

 

There are a number of excellent ways to do this in SAS.

 

Tom

pavan1
Obsidian | Level 7

Hello Tom,

 

We have two data sets here, client and vendor files. We have to compare both and give the discrepancies as shown below

like wise, we have to compare client-vendor and vendor-client (from both sides) and show the discrepancies data set.

 

client data set:

 

subject    visit    date_of_exam    modality
101    baseline    1-Jan-89    CT
101    week07    30-Apr-89    MRI
101    week13    5-May-89    PET
101    week27    12-Jun-89    ECHO
101    week27    12-Jun-89    MUGA

 

vendor dataset:

 

subject    visit    date_of exam    modality
101    baseline    5-Jan-89    CT
101    WEEK07    30-Apr-89    CT
101    week13    10-May-89    DUAL-CT
101    WEEK27    12-Jun-89    ECHO

 

Discrepancy Report: (Output Data set 😞

 

SUBJECT VISIT    DATE_OF_EXAM    MODALITY    TYPE_OF_DESCRIPANCY    DESCRIPTION
101    baseline    5-Jan-89                      CT            DATE OF EXAM DISCRIPANT    CLIENT HAS 01-JAN-89
101    week07    30-Apr-89                     CT             MODALITY DISCREPANT    CLIENT HAS MRI
101    week13    10-May-89                  DUAL-CT    MISSING IN CLIENT DATABASE    RECORD MISSING IN CLIENT(IF MORE      THAN                  ONE                                                                    VARIABLE IS DISCREPANT IT SHOULD BE SHOWN AS MISSING)
101    week27    12-Jun-89                    MUGA    MISSING IN VENDOR DATABASE    RECORD MISSING IN VENDOR

 

Please help me with this

 

Thank you

 

DataFlux_SME
Calcite | Level 5

A common approach that I've used many times in the past is as follows:

 

1. Get a unique list of all PKs from both data sources 

2. Left join data source 1 and 2 to the PK Lists and write a conditional logic to identify possible missing and changed values for each field 

3. summarize the discrepancy results for each record

 

Let me know if this helps.

 

pavan1
Obsidian | Level 7
hello DataFlux_SME,

Can you please provide me with the example program?
DataFlux_SME
Calcite | Level 5

Here is an Oracle SQL code / algorithm that you can use directly if your data is already in a database or convert to SAS to perform such reconciliation. It identifies missing records on either data-set as well as discrepancies in the modality column.

 

select pks.*,
       nvl(c.date_of_exam, v.date_of_exam) as date_of_exam,
       case
         when c.modality is null then
          'Missing in Client Database'
         when v.modaility is null then
          'Missing in Vendor Database'
         when c.modality <> v.modailty then
          'Client has ' || c.modality || ' and vendor has ' || v.modality
         else
          null
       end as type_of_discrepancy
       
       (select distinct subject, visit
          from client
        union
        select distinct subject, visit
          from vendor) pks left join client c on pks.subject = c.subject and pks.visit = c.visit left join vendor v on pks.subject = v.subject and pks.visit = v.visit
TomKari
Onyx | Level 15

This might be trickier than it looks:

 

To add one record to your example, what if you have

 

subject visit date_of_exam modality
101 week27 12-Jun-89 ECHO
101 week27 12-Jun-89 MUGA

 

subject visit date_of exam modality
101 WEEK27 12-Jun-89 ECHO
102 week27 12-Jun-89 MUGA

 

Is that an improper subject (102 should be 101), or a missing 101 MUGA in VENDOR and a missing 102 MUGA in CLIENT? And as soon as two variables don't match, it becomes even more challenging.

 

As a first step, I suggest you find the cases where ANYTHING doesn't match, and review them manually. If you don't have too many, you may be able to make logical decisions based on that. If there are a lot, you might be able to use this result to derive some hierarchical rules.

 

Tom

pavan1
Obsidian | Level 7

hello Tom,

 

The discrepancies should be shown subject wise by comparing the data from Client and Vendor

SASKiwi
PROC Star

Do you mean a simple row by row comparison between the two tables? If that is the case then PROC COMPARE can do that easily:

 

proc compare base = client
             compare = vendor
             ;
run;

Although I think your requirements are more like this:

proc compare base = client
             compare = vendor
             ;
  id subject visit;
run;

 

Samyuktha1
Calcite | Level 5
Can we use Reconcile statements for missing values , duplicate records , mismatches ?
SASKiwi
PROC Star

What do you mean by Reconcile statements? Please post examples.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 3523 views
  • 0 likes
  • 5 in conversation