BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dincooo
Obsidian | Level 7

Hi all,

 

There was a server change, so tables are transfarred to new server. So I am trying to compare two tables to see if there is a mismatch or not.

 

The table has 823 variables and 157 mio observations.

 

Proc compare and proc means are taking too much time and ther are still working.

 

How can I compare those two datasets faster, do you have any idea?

 

Best Regards,

Onur

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

If you use proc compare I see no need to run proc means in oreder to check idetity of the two corresponding tables.

 

In order to save time, maybe you can sdd/remove some proc compare options, but I'm not sure you will save much time;

 

The eonly way I know to be sure that two tables are identical is by proc compare;

 

In some cases, maybe if you compare NOBS, NVARS and date_created / date_updated, 

you can filter damaged files from running proc compare;

 

One point more - how did the files moved/copied to the new server ? Do you trust the way it was done ?

In case of demounting/mounting a HD, ther are no couples of tables to compare.

 

Proc compare time is mainly I/O time. Is there a way to eliminate I/O time and make it faster ?

View solution in original post

4 REPLIES 4
Kurt_Bremser
Super User

If you run proc compare with at least one dataset being accessed via network, expect dismal performance.

 

Unless you have really old data, a physical copy of the .sas7bdat files should suffice, and nothing that changes data can conceivably happen there.

 

Or did you have a complete change of platforms (like z/OS to UNIX)?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Oh, to add.  Why not use some sort of secure copy, use MD5 or one of the hashing algorthms on the file to ensure its the same once copied?  Should all be in the migration process which you are working on (what does your migration plan state on migration testing?)

 

Question 1 - why does the dataset have 800+ variables - that sounds like very bad data modelling to me (and will likely contribute to the time taken to do any processing on that dataset), hower with so many obs (if my below is correct) it sounds like you have just dumped every bit of data you have in one dataset.

 

What does this mean:

157 mio observations.

 

157 million observations?  If so then your talking about big data and you need to look at tech specific to that, maybe Hadoop or something like that.  

Proc compare/means should be as fast as possible in Base SAS.  You could of course assign more resources, ram/processors etc.

Shmuel
Garnet | Level 18

If you use proc compare I see no need to run proc means in oreder to check idetity of the two corresponding tables.

 

In order to save time, maybe you can sdd/remove some proc compare options, but I'm not sure you will save much time;

 

The eonly way I know to be sure that two tables are identical is by proc compare;

 

In some cases, maybe if you compare NOBS, NVARS and date_created / date_updated, 

you can filter damaged files from running proc compare;

 

One point more - how did the files moved/copied to the new server ? Do you trust the way it was done ?

In case of demounting/mounting a HD, ther are no couples of tables to compare.

 

Proc compare time is mainly I/O time. Is there a way to eliminate I/O time and make it faster ?

ballardw
Super User

Unless you changed operating systems I would look to system tools for comparison such as FC in windows.

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1936 views
  • 0 likes
  • 5 in conversation