Solved: Re: Same SAS code giving different results on different Linux machine....

GauravT · Posted 11-13-2014 11:37 PM

Hi,

I am currently in a project where we are migrating our SAS analytics from one Linux platform to another Linux platform.

As part of the migration, i was testing some SAS code to check we get the same output results when running a same query on new server.

To my surprise, what I found was that when I am trying to merge 2 datasets the resultant dataset is different from the one which was created in the old environment (althought the difference is of record being up and down). I have checked it using PROC COMPARE with EXACT method.

To make by above question more clear:

2 datasets A and B were merged. Both A and B were created from raw data in the new environment and where checked using PROC COMPARE to ensure both the datasets are same in old and new environment. Datasets A and B were also checked using PROC COMPARE between the environment and were perfect match.

Then A and B datasets were merged in new environment (C data created) and same thing was already done in old environment. When checked (C data between the env.) using PROC COMPARE it gave almost the entire data not matching. When i actually checked the records, i saw the record in both the datasets (C in old and C in new) were in different order (record 9 of C created in new env, is record 10 in C in old env. ).

I have been searching through web, but didn;t get anything on this.

Anybody has any idea what could be the issue.

Thanks,

Gaurav

Kurt_Bremser · Posted 11-14-2014 02:49 AM

You may experience pitfalls like different blocksizes in the files caused by the OS. This may lead to different physical locations (other page) for the records in the datasets.

If you except a certain order, tell proc sql so. Or did you do a data step merge?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

SASKiwi · Posted 11-14-2014 12:04 AM

Please describe what you mean by not "matching" in more detail. If you are referring to data where there are "small" differences between numeric values which have been sourced identically - differences of the order of 1E-12 or smaller it is most likely due to floating point precision differences between your Linux platforms. There are many references to this topic if you search through SAS Support including:

http://support.sas.com/resources/papers/dealing-with-numeric-representation-error-in-sas-application...

GauravT · Posted 11-14-2014 12:39 AM

By matching I mean comparing the datasets to ensure the data values in the variables are same. I am using PROC COMPARE METHOD=EXACT for comparision. The difference is not related to small difference between the numeric values. Below is a view of what the difference looks like.

Dataset C in old environment: Dataset C in new environment:

ID ID

10001 10001

10002 10003

10003 10002

As you can see the difference is coming in the order of the records in the dataset.

Ideally if we are merging 2 dataset same in both the env. the resultant merged data should also be same. Just to add the merger is done using proc sql and doing a left join.

thanks,

Gaurav

quintonsmit_gmail_com · Posted 11-14-2014 01:09 AM

Hi Gaurav

Performing a proc sql left join is not going to create a output dataset with the records in the same order each time.

Depending on the how the input datasets are sorted BEFORE the proc sql will effect the order of the records in the output dataset.

To test the results across the 2 environments in terms of the records being in the same order, you would want to add a "order by ID" statement to you sql query.

----------------------------------------------------------------------------------------------------------------------------------------------------

TABLE A LEFT JOIN TABLE B

ON ID

The left join simply states that you would want all the records from TABLE A to be in the output dataset

----------------------------------------------------------------------------------------------------------------------------------------------------

TABLE A LEFT JOIN TABLE B

ON ID

ORDER BY ID

This would output all the records from TABLE A AND the output dataset will be sorted by ID

Quinton

jakarman · Posted 11-14-2014 02:03 AM

Quinton is correctly pointing at some common bad habits. A lot of code today is build on an trial/error/change approach and not by design thinking on what is going to be done.

This is resulting in code that is unstable and will give surprises when doing migrations. There are a lot of pitfalls around.

The mentioned missing order while using SQL (or using undocumented features like monotone) is one of them. Using different encoding an other.

---->-- ja karman --<-----

Kurt_Bremser · Posted 11-14-2014 02:49 AM

You may experience pitfalls like different blocksizes in the files caused by the OS. This may lead to different physical locations (other page) for the records in the datasets.

If you except a certain order, tell proc sql so. Or did you do a data step merge?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

GauravT · Posted 12-01-2014 01:52 AM

thanks Kurt and everyone for the sharing the info. The issue was with the order of the data due to different option set for SORTSIZE in both the environment. By sorting the data by mutiple variables which makes the record unique the data is matching perfectly.

thanks,

KaushikNanduri · Posted 11-24-2014 10:42 PM

Hi Gaurav,

The output dataset generated depends not only on OS but also some other factors.

It is always advisable to sort the datasets before going to compare them.

Hope this helps.

SAS Innovate 2025: Call for Content

Classroom Training Available!