Help using Base SAS procedures

Compare two datasets

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

Compare two datasets

Hi all,

 

There was a server change, so tables are transfarred to new server. So I am trying to compare two tables to see if there is a mismatch or not.

 

The table has 823 variables and 157 mio observations.

 

Proc compare and proc means are taking too much time and ther are still working.

 

How can I compare those two datasets faster, do you have any idea?

 

Best Regards,

Onur


Accepted Solutions
Solution
‎12-16-2016 01:49 AM
Trusted Advisor
Posts: 1,573

Re: Compare two datasets

If you use proc compare I see no need to run proc means in oreder to check idetity of the two corresponding tables.

 

In order to save time, maybe you can sdd/remove some proc compare options, but I'm not sure you will save much time;

 

The eonly way I know to be sure that two tables are identical is by proc compare;

 

In some cases, maybe if you compare NOBS, NVARS and date_created / date_updated, 

you can filter damaged files from running proc compare;

 

One point more - how did the files moved/copied to the new server ? Do you trust the way it was done ?

In case of demounting/mounting a HD, ther are no couples of tables to compare.

 

Proc compare time is mainly I/O time. Is there a way to eliminate I/O time and make it faster ?

View solution in original post


All Replies
Super User
Posts: 7,824

Re: Compare two datasets

If you run proc compare with at least one dataset being accessed via network, expect dismal performance.

 

Unless you have really old data, a physical copy of the .sas7bdat files should suffice, and nothing that changes data can conceivably happen there.

 

Or did you have a complete change of platforms (like z/OS to UNIX)?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Super User
Posts: 7,973

Re: Compare two datasets

[ Edited ]

Oh, to add.  Why not use some sort of secure copy, use MD5 or one of the hashing algorthms on the file to ensure its the same once copied?  Should all be in the migration process which you are working on (what does your migration plan state on migration testing?)

 

Question 1 - why does the dataset have 800+ variables - that sounds like very bad data modelling to me (and will likely contribute to the time taken to do any processing on that dataset), hower with so many obs (if my below is correct) it sounds like you have just dumped every bit of data you have in one dataset.

 

What does this mean:

157 mio observations.

 

157 million observations?  If so then your talking about big data and you need to look at tech specific to that, maybe Hadoop or something like that.  

Proc compare/means should be as fast as possible in Base SAS.  You could of course assign more resources, ram/processors etc.

Solution
‎12-16-2016 01:49 AM
Trusted Advisor
Posts: 1,573

Re: Compare two datasets

If you use proc compare I see no need to run proc means in oreder to check idetity of the two corresponding tables.

 

In order to save time, maybe you can sdd/remove some proc compare options, but I'm not sure you will save much time;

 

The eonly way I know to be sure that two tables are identical is by proc compare;

 

In some cases, maybe if you compare NOBS, NVARS and date_created / date_updated, 

you can filter damaged files from running proc compare;

 

One point more - how did the files moved/copied to the new server ? Do you trust the way it was done ?

In case of demounting/mounting a HD, ther are no couples of tables to compare.

 

Proc compare time is mainly I/O time. Is there a way to eliminate I/O time and make it faster ?

Super User
Posts: 11,343

Re: Compare two datasets

Unless you changed operating systems I would look to system tools for comparison such as FC in windows.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 293 views
  • 0 likes
  • 5 in conversation