BookmarkSubscribeRSS Feed
Sapkota
Calcite | Level 5

I am perplexed to find a 'value' of a variable by ID (Lab_) in the merged file that does not exist in the original file.

In the merged file abc below, I see the value for a variable by a specific Lab_ that was not seen in the file4.

More precisely, the Lab_ 123213 did not show any value in file4 but now showing value in the merged file abc. I am wondering how to fix it.

Any suggetions would be highy useful. Thanks in advance.

data abc;

merge file4 (in=x) file1 file 2;

by Lab_;

in x;

run;

Sanjeeb

13 REPLIES 13
stat_sas
Ammonite | Level 13

Hi,

Not sure what is the main problem but see below what I found in your syntax does not look correct.

in x;  it should be  if x;

secondly file 2 is file2 right?

Thanks,

Naeem

Sapkota
Calcite | Level 5

Thanks. Yes those were typos. When I wrote the code in this post to illustrate the example I miswrote those --yes, I did if x; and yes, file2.

But the main problem is that the merged file is showing a value by 'Lab_' which is not found in the original file.

I was wondering what could be the possibility for a value to appear in a merged file when by a specific Lab_ when such is not present in the original file. Kindly suggest. Thanks in advance.

Reeza
Super User

Please post the exact code used.

Sapkota
Calcite | Level 5

Hi Reeza,

Please find the codes I used as attached. My issue is that the yellow highlighted Lab_ 1556686 (in the attached code) doe not have a value for Analyte = "Total VOCs as Toluene" in the original file exclu4, but is showing a value in "data Toltole" for Analyte ="Total VOCs as Toluene". I must have done some error but do not know what. Thanks in advance for your help.

Sanjeeb

Reeza
Super User

This is expected SAS behaviour, assuming that one of your other data sets also has the variable Analyte, it will overwrite the variable in the first file. 

If you're merging files make sure the variables in each file, besides the merge variables, are unique.

Reeza
Super User

It would be nice. Honestly I use SQL for all my merges.

Kurt_Bremser
Super User

Alas, the performance of SAS's SQL when doing large table joins has proven so inferior to proc sort/data step sequences (epsecially when several are running concurrently) that we only use SQL where absolutely required (eg cartesian join).

Sapkota
Calcite | Level 5

Yes, I too was expecting a note or a suggestions from SAS and was wondering why it did not provide.

Sapkota
Calcite | Level 5

Thank you for this.

art297
Opal | Level 21

You didn't ask to include it.  You stipulated "in x;", rather than what you wanted, namely:

if x;

jakarman
Barite | Level 11

I see 1556686 in the datasets exclutwo and in exclu4. 
There are differences with SQL and SAS to know (SQL is designed for a DBMS goal administration.) those are:

- The missing concept of statistics is missing in SQL. it is present in SAS. Notice you can many values in SAS being classified as missing.

- The join/merge is different:

   + SAS dataset join/merge will propagate values in the coalesce way updating missings automatically in the resulting datasets combining all information

   + When an ID is having missing values SAS will not drop those handle them is being something to do with, but SQL is dropping them.

   + SQL is possible generating a Cartesian-product with SAS join merge that is more difficult.
      When you the message that Cartesian product has ben used... Get on your alarm bells when  you are having performance/big data relations


Kurt mentioned already the big data  performance issue. With SQL data is thought not to be ordered. It is possible to get those results random.

You could parallelize some processing that is done with threading. Most people have problems to think like this.
With SAS you are thinking the data is ordered and you process them ordered (the automatic OBS). Most people are thinking like this.       

Your result is exactly what you have code as should being to happen.

The in selection is only a selection should come from this source, all other rules as described are applicable (the missings being replaces out of the other dataset)

---->-- ja karman --<-----
Sapkota
Calcite | Level 5

Thank you very much for taking a detail look. I will try this and see.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1094 views
  • 0 likes
  • 6 in conversation