Hi,
I am currently in a project where we are migrating our SAS analytics from one Linux platform to another Linux platform.
As part of the migration, i was testing some SAS code to check we get the same output results when running a same query on new server.
To my surprise, what I found was that when I am trying to merge 2 datasets the resultant dataset is different from the one which was created in the old environment (althought the difference is of record being up and down). I have checked it using PROC COMPARE with EXACT method.
To make by above question more clear:
2 datasets A and B were merged. Both A and B were created from raw data in the new environment and where checked using PROC COMPARE to ensure both the datasets are same in old and new environment. Datasets A and B were also checked using PROC COMPARE between the environment and were perfect match.
Then A and B datasets were merged in new environment (C data created) and same thing was already done in old environment. When checked (C data between the env.) using PROC COMPARE it gave almost the entire data not matching. When i actually checked the records, i saw the record in both the datasets (C in old and C in new) were in different order (record 9 of C created in new env, is record 10 in C in old env. ).
I have been searching through web, but didn;t get anything on this.
Anybody has any idea what could be the issue.
Thanks,
Gaurav
You may experience pitfalls like different blocksizes in the files caused by the OS. This may lead to different physical locations (other page) for the records in the datasets.
If you except a certain order, tell proc sql so. Or did you do a data step merge?
Please describe what you mean by not "matching" in more detail. If you are referring to data where there are "small" differences between numeric values which have been sourced identically - differences of the order of 1E-12 or smaller it is most likely due to floating point precision differences between your Linux platforms. There are many references to this topic if you search through SAS Support including:
By matching I mean comparing the datasets to ensure the data values in the variables are same. I am using PROC COMPARE METHOD=EXACT for comparision. The difference is not related to small difference between the numeric values. Below is a view of what the difference looks like.
Dataset C in old environment: Dataset C in new environment:
ID ID
10001 10001
10002 10003
10003 10002
As you can see the difference is coming in the order of the records in the dataset.
Ideally if we are merging 2 dataset same in both the env. the resultant merged data should also be same. Just to add the merger is done using proc sql and doing a left join.
thanks,
Gaurav
Hi Gaurav
Performing a proc sql left join is not going to create a output dataset with the records in the same order each time.
Depending on the how the input datasets are sorted BEFORE the proc sql will effect the order of the records in the output dataset.
To test the results across the 2 environments in terms of the records being in the same order, you would want to add a "order by ID" statement to you sql query.
----------------------------------------------------------------------------------------------------------------------------------------------------
TABLE A LEFT JOIN TABLE B
ON ID
The left join simply states that you would want all the records from TABLE A to be in the output dataset
----------------------------------------------------------------------------------------------------------------------------------------------------
TABLE A LEFT JOIN TABLE B
ON ID
ORDER BY ID
This would output all the records from TABLE A AND the output dataset will be sorted by ID
Quinton
Quinton is correctly pointing at some common bad habits. A lot of code today is build on an trial/error/change approach and not by design thinking on what is going to be done.
This is resulting in code that is unstable and will give surprises when doing migrations. There are a lot of pitfalls around.
The mentioned missing order while using SQL (or using undocumented features like monotone) is one of them. Using different encoding an other.
You may experience pitfalls like different blocksizes in the files caused by the OS. This may lead to different physical locations (other page) for the records in the datasets.
If you except a certain order, tell proc sql so. Or did you do a data step merge?
thanks Kurt and everyone for the sharing the info. The issue was with the order of the data due to different option set for SORTSIZE in both the environment. By sorting the data by mutiple variables which makes the record unique the data is matching perfectly.
thanks,
Hi Gaurav,
The output dataset generated depends not only on OS but also some other factors.
It is always advisable to sort the datasets before going to compare them.
Hope this helps.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.