Same SAS code giving different results on different Linux machine..

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

Same SAS code giving different results on different Linux machine..

Hi,

I am currently in a project where we are migrating our SAS analytics from one Linux platform to another Linux platform.

As part of the migration, i was testing some SAS code to check we get the same output results when running a same query on new server.

To my surprise, what I found was that when I am trying to merge 2 datasets the resultant dataset is different from the one which was created in the old environment (althought the difference is of record being up and down). I have checked it using PROC COMPARE with EXACT method.

To make by above question more clear:

2 datasets A and B were merged. Both A and B were created from raw data in the new environment and where checked using PROC COMPARE to ensure both the datasets are same in old and new environment. Datasets A and B were also checked using PROC COMPARE between the environment and were perfect match.

Then A and B datasets were merged in new environment (C data created) and same thing was already done in old environment. When checked (C data between the env.) using PROC COMPARE it gave almost the entire data not matching. When i actually checked the records, i saw the record in both the datasets (C in old and C in new) were in different order (record 9 of C created in new env, is record 10 in C  in old env. ).

I have been searching through web, but didn;t get anything on this.

Anybody has any idea what could be the issue.

Thanks,

Gaurav


Accepted Solutions
Solution
‎11-14-2014 02:49 AM
Super User
Posts: 7,446

Re: Same SAS code giving different results on different Linux machine..

You may experience pitfalls like different blocksizes in the files caused by the OS. This may lead to different physical locations (other page) for the records in the datasets.

If you except a certain order, tell proc sql so. Or did you do a data step merge?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Super User
Posts: 3,237

Re: Same SAS code giving different results on different Linux machine..

Please describe what you mean by not "matching" in more detail. If you are referring to data where there are "small" differences between numeric values which have been sourced identically - differences of the order of 1E-12 or smaller it is most likely due to floating point precision differences between your Linux platforms. There are many references to this topic if you search through SAS Support including:

http://support.sas.com/resources/papers/dealing-with-numeric-representation-error-in-sas-application...

Occasional Contributor
Posts: 5

Re: Same SAS code giving different results on different Linux machine..

By matching I mean comparing the datasets to ensure the data values in the variables are same. I am using PROC COMPARE METHOD=EXACT for comparision. The difference is not related to small difference between the numeric values. Below is a view of what the difference looks like.

Dataset C in old environment:                              Dataset C in new environment:

ID                                                                      ID

10001                                                                 10001

10002                                                                 10003

10003                                                                 10002

As you can see the difference is coming in the order of the records in the dataset.

Ideally if we are merging 2 dataset same in both the env. the resultant merged data should also be same. Just to add the merger is done using proc sql and doing a left join.

thanks,

Gaurav

Occasional Contributor
Posts: 12

Re: Same SAS code giving different results on different Linux machine..

Hi Gaurav

Performing a proc sql left join is not going to create a output dataset with the records in the same order each time.

Depending on the how the input datasets are sorted BEFORE the proc sql will effect the order of the records in the output dataset.

To test the results across the 2 environments in terms of the records being in the same order, you would want to add a "order by ID" statement to you sql query.

----------------------------------------------------------------------------------------------------------------------------------------------------

TABLE A LEFT JOIN TABLE B

ON ID

The left join simply states that you would want all the records from TABLE A to be in the output dataset

----------------------------------------------------------------------------------------------------------------------------------------------------

TABLE A LEFT JOIN TABLE B

ON ID

ORDER BY ID

This would output all the records from TABLE A AND the output dataset will be sorted by ID

Quinton

Valued Guide
Posts: 3,208

Re: Same SAS code giving different results on different Linux machine..

Quinton is correctly pointing at some common bad habits. A lot of code today is build on an trial/error/change approach and not by design thinking on what is going to be done.

This is resulting in code that is unstable and will give surprises when doing migrations.  There are a lot of pitfalls around.      

The mentioned missing order while using SQL (or using undocumented features like monotone) is one of them. Using different encoding an other.    

---->-- ja karman --<-----
Solution
‎11-14-2014 02:49 AM
Super User
Posts: 7,446

Re: Same SAS code giving different results on different Linux machine..

You may experience pitfalls like different blocksizes in the files caused by the OS. This may lead to different physical locations (other page) for the records in the datasets.

If you except a certain order, tell proc sql so. Or did you do a data step merge?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 5

Re: Same SAS code giving different results on different Linux machine..

thanks Kurt and everyone for the sharing the info. The issue was with the order of the data due to different option set for SORTSIZE  in both the environment. By sorting the data by mutiple variables which makes the record unique the data is matching perfectly.

thanks,

N/A
Posts: 1

Re: Same SAS code giving different results on different Linux machine..

Hi Gaurav,

The output dataset generated depends not only on OS but also some other factors.

It is always advisable to sort the datasets before going to compare them.

Hope this helps.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 802 views
  • 7 likes
  • 6 in conversation