BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GauravT
Calcite | Level 5

Hi,

I am currently in a project where we are migrating our SAS analytics from one Linux platform to another Linux platform.

As part of the migration, i was testing some SAS code to check we get the same output results when running a same query on new server.

To my surprise, what I found was that when I am trying to merge 2 datasets the resultant dataset is different from the one which was created in the old environment (althought the difference is of record being up and down). I have checked it using PROC COMPARE with EXACT method.

To make by above question more clear:

2 datasets A and B were merged. Both A and B were created from raw data in the new environment and where checked using PROC COMPARE to ensure both the datasets are same in old and new environment. Datasets A and B were also checked using PROC COMPARE between the environment and were perfect match.

Then A and B datasets were merged in new environment (C data created) and same thing was already done in old environment. When checked (C data between the env.) using PROC COMPARE it gave almost the entire data not matching. When i actually checked the records, i saw the record in both the datasets (C in old and C in new) were in different order (record 9 of C created in new env, is record 10 in C  in old env. ).

I have been searching through web, but didn;t get anything on this.

Anybody has any idea what could be the issue.

Thanks,

Gaurav

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

You may experience pitfalls like different blocksizes in the files caused by the OS. This may lead to different physical locations (other page) for the records in the datasets.

If you except a certain order, tell proc sql so. Or did you do a data step merge?

View solution in original post

7 REPLIES 7
SASKiwi
PROC Star

Please describe what you mean by not "matching" in more detail. If you are referring to data where there are "small" differences between numeric values which have been sourced identically - differences of the order of 1E-12 or smaller it is most likely due to floating point precision differences between your Linux platforms. There are many references to this topic if you search through SAS Support including:

http://support.sas.com/resources/papers/dealing-with-numeric-representation-error-in-sas-application...

GauravT
Calcite | Level 5

By matching I mean comparing the datasets to ensure the data values in the variables are same. I am using PROC COMPARE METHOD=EXACT for comparision. The difference is not related to small difference between the numeric values. Below is a view of what the difference looks like.

Dataset C in old environment:                              Dataset C in new environment:

ID                                                                      ID

10001                                                                 10001

10002                                                                 10003

10003                                                                 10002

As you can see the difference is coming in the order of the records in the dataset.

Ideally if we are merging 2 dataset same in both the env. the resultant merged data should also be same. Just to add the merger is done using proc sql and doing a left join.

thanks,

Gaurav

quintonsmit_gmail_com
Calcite | Level 5

Hi Gaurav

Performing a proc sql left join is not going to create a output dataset with the records in the same order each time.

Depending on the how the input datasets are sorted BEFORE the proc sql will effect the order of the records in the output dataset.

To test the results across the 2 environments in terms of the records being in the same order, you would want to add a "order by ID" statement to you sql query.

----------------------------------------------------------------------------------------------------------------------------------------------------

TABLE A LEFT JOIN TABLE B

ON ID

The left join simply states that you would want all the records from TABLE A to be in the output dataset

----------------------------------------------------------------------------------------------------------------------------------------------------

TABLE A LEFT JOIN TABLE B

ON ID

ORDER BY ID

This would output all the records from TABLE A AND the output dataset will be sorted by ID

Quinton

jakarman
Barite | Level 11

Quinton is correctly pointing at some common bad habits. A lot of code today is build on an trial/error/change approach and not by design thinking on what is going to be done.

This is resulting in code that is unstable and will give surprises when doing migrations.  There are a lot of pitfalls around.      

The mentioned missing order while using SQL (or using undocumented features like monotone) is one of them. Using different encoding an other.    

---->-- ja karman --<-----
Kurt_Bremser
Super User

You may experience pitfalls like different blocksizes in the files caused by the OS. This may lead to different physical locations (other page) for the records in the datasets.

If you except a certain order, tell proc sql so. Or did you do a data step merge?

GauravT
Calcite | Level 5

thanks Kurt and everyone for the sharing the info. The issue was with the order of the data due to different option set for SORTSIZE  in both the environment. By sorting the data by mutiple variables which makes the record unique the data is matching perfectly.

thanks,

KaushikNanduri
Calcite | Level 5

Hi Gaurav,

The output dataset generated depends not only on OS but also some other factors.

It is always advisable to sort the datasets before going to compare them.

Hope this helps.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2951 views
  • 7 likes
  • 6 in conversation