I am perplexed to find a 'value' of a variable by ID (Lab_) in the merged file that does not exist in the original file.
In the merged file abc below, I see the value for a variable by a specific Lab_ that was not seen in the file4.
More precisely, the Lab_ 123213 did not show any value in file4 but now showing value in the merged file abc. I am wondering how to fix it.
Any suggetions would be highy useful. Thanks in advance.
data abc;
merge file4 (in=x) file1 file 2;
by Lab_;
in x;
run;
Sanjeeb
Hi,
Not sure what is the main problem but see below what I found in your syntax does not look correct.
in x; it should be if x;
secondly file 2 is file2 right?
Thanks,
Naeem
Thanks. Yes those were typos. When I wrote the code in this post to illustrate the example I miswrote those --yes, I did if x; and yes, file2.
But the main problem is that the merged file is showing a value by 'Lab_' which is not found in the original file.
I was wondering what could be the possibility for a value to appear in a merged file when by a specific Lab_ when such is not present in the original file. Kindly suggest. Thanks in advance.
Please post the exact code used.
Hi Reeza,
Please find the codes I used as attached. My issue is that the yellow highlighted Lab_ 1556686 (in the attached code) doe not have a value for Analyte = "Total VOCs as Toluene" in the original file exclu4, but is showing a value in "data Toltole" for Analyte ="Total VOCs as Toluene". I must have done some error but do not know what. Thanks in advance for your help.
Sanjeeb
This is expected SAS behaviour, assuming that one of your other data sets also has the variable Analyte, it will overwrite the variable in the first file.
If you're merging files make sure the variables in each file, besides the merge variables, are unique.
I think that SAS should at least give a note when this happens.
It would be nice. Honestly I use SQL for all my merges.
Alas, the performance of SAS's SQL when doing large table joins has proven so inferior to proc sort/data step sequences (epsecially when several are running concurrently) that we only use SQL where absolutely required (eg cartesian join).
Yes, I too was expecting a note or a suggestions from SAS and was wondering why it did not provide.
Thank you for this.
You didn't ask to include it. You stipulated "in x;", rather than what you wanted, namely:
if x;
I see 1556686 in the datasets exclutwo and in exclu4.
There are differences with SQL and SAS to know (SQL is designed for a DBMS goal administration.) those are:
- The missing concept of statistics is missing in SQL. it is present in SAS. Notice you can many values in SAS being classified as missing.
- The join/merge is different:
+ SAS dataset join/merge will propagate values in the coalesce way updating missings automatically in the resulting datasets combining all information
+ When an ID is having missing values SAS will not drop those handle them is being something to do with, but SQL is dropping them.
+ SQL is possible generating a Cartesian-product with SAS join merge that is more difficult.
When you the message that Cartesian product has ben used... Get on your alarm bells when you are having performance/big data relations
Kurt mentioned already the big data performance issue. With SQL data is thought not to be ordered. It is possible to get those results random.
You could parallelize some processing that is done with threading. Most people have problems to think like this.
With SAS you are thinking the data is ordered and you process them ordered (the automatic OBS). Most people are thinking like this.
Your result is exactly what you have code as should being to happen.
The in selection is only a selection should come from this source, all other rules as described are applicable (the missings being replaces out of the other dataset)
Thank you very much for taking a detail look. I will try this and see.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.