05-23-2014 05:20 PM
I am perplexed to find a 'value' of a variable by ID (Lab_) in the merged file that does not exist in the original file.
In the merged file abc below, I see the value for a variable by a specific Lab_ that was not seen in the file4.
More precisely, the Lab_ 123213 did not show any value in file4 but now showing value in the merged file abc. I am wondering how to fix it.
Any suggetions would be highy useful. Thanks in advance.
merge file4 (in=x) file1 file 2;
05-23-2014 05:31 PM
Not sure what is the main problem but see below what I found in your syntax does not look correct.
in x; it should be if x;
secondly file 2 is file2 right?
05-23-2014 05:48 PM
Thanks. Yes those were typos. When I wrote the code in this post to illustrate the example I miswrote those --yes, I did if x; and yes, file2.
But the main problem is that the merged file is showing a value by 'Lab_' which is not found in the original file.
I was wondering what could be the possibility for a value to appear in a merged file when by a specific Lab_ when such is not present in the original file. Kindly suggest. Thanks in advance.
05-23-2014 06:38 PM
Please find the codes I used as attached. My issue is that the yellow highlighted Lab_ 1556686 (in the attached code) doe not have a value for Analyte = "Total VOCs as Toluene" in the original file exclu4, but is showing a value in "data Toltole" for Analyte ="Total VOCs as Toluene". I must have done some error but do not know what. Thanks in advance for your help.
05-24-2014 11:22 PM
This is expected SAS behaviour, assuming that one of your other data sets also has the variable Analyte, it will overwrite the variable in the first file.
If you're merging files make sure the variables in each file, besides the merge variables, are unique.
05-25-2014 09:03 AM
I think that SAS should at least give a note when this happens.
05-26-2014 02:02 AM
Alas, the performance of SAS's SQL when doing large table joins has proven so inferior to proc sort/data step sequences (epsecially when several are running concurrently) that we only use SQL where absolutely required (eg cartesian join).
05-26-2014 03:03 AM
I see 1556686 in the datasets exclutwo and in exclu4.
There are differences with SQL and SAS to know (SQL is designed for a DBMS goal administration.) those are:
- The missing concept of statistics is missing in SQL. it is present in SAS. Notice you can many values in SAS being classified as missing.
- The join/merge is different:
+ SAS dataset join/merge will propagate values in the coalesce way updating missings automatically in the resulting datasets combining all information
+ When an ID is having missing values SAS will not drop those handle them is being something to do with, but SQL is dropping them.
+ SQL is possible generating a Cartesian-product with SAS join merge that is more difficult.
When you the message that Cartesian product has ben used... Get on your alarm bells when you are having performance/big data relations
Kurt mentioned already the big data performance issue. With SQL data is thought not to be ordered. It is possible to get those results random.
You could parallelize some processing that is done with threading. Most people have problems to think like this.
With SAS you are thinking the data is ordered and you process them ordered (the automatic OBS). Most people are thinking like this.
Your result is exactly what you have code as should being to happen.
The in selection is only a selection should come from this source, all other rules as described are applicable (the missings being replaces out of the other dataset)