BookmarkSubscribeRSS Feed
Binxie
Calcite | Level 5

Hello

I have a two extra large data sets (>9Million records) that I'm am trying to compare.  The data sets have over 300 columns in them.

I was able to run a proc compare that outputs the differences only by ID, however, because of the size of the data set and the restrictions on exporting at my company, I cannot get the data out to analyze on a record by record basis which columns are not matching.

 

Is there an easy way in SAS that I can only show the variables (by the ID key) that are different?  Any other suggestions on how I can view only the differences in the individual columns? 

 

Thanks

 

9 REPLIES 9
Reeza
Super User
Why can't you get the output? Have you tried suppressing the printed output and generating a data set to view instead?
Binxie
Calcite | Level 5

I can view the output fine.  I just need the ability to identify the differences easily.  Currently there are over 400,000 rows and in the 350+ columns there could be differences. I'm just trying to identify, by the ID which records in which columns have differences. 

 

If I could export, it would be easier to analyze, but I can't so I'm trying to figure out a way to just output by ID the column where the dif=XXX.

 

 

Reeza
Super User
Are you outputting your results to a data set using the OUT option? If not, that's likely what you want to do, since you can then filter that as desired. The OUT options are on the PROC COMPARE statement. If you don't know how, please show your current code and we can show you how to modify it.
Binxie
Calcite | Level 5

Yes, I'm outputting it to a data set.  The problem is because there is over 400,000 records and 350+ columns, I don't know the best way to find the differences.  How do I easily identify, by the Primary key, which columns have the differences without scrolling thru 400,000 rows and 350 columns. 

 

 

Reeza
Super User
Have you tried using the OUTDIF or OUTNOEQUAL options to limit the output to records with differences?
Binxie
Calcite | Level 5

Yes, I have outdif and outnoequal in my query.  There are actually 400,000 difs in the data. 

 

proc compare base=recs_a compare=recs_b

out=result outnoequal outbase outcomp outdif

noprint;

id POLICY_NUMBER TERM_NUMBER POLICY_RISK_IDENTIFIER EFFECTIVE_FROM EFFECTIVE_TO;

run;

 

 

Reeza
Super User
Try just having the outnoequal, I think outnoequal and outdif are the opposite of each other.
Reeza
Super User
Did you use the METHOD/FUZZ option so that minute differences are suppressed?
Binxie
Calcite | Level 5

These are actual differences.  The two data sets need to be identical.  It's a new view that was created from an old one, and we are testing that the landed data is matching as the business expects it to.

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 2487 views
  • 0 likes
  • 2 in conversation