BookmarkSubscribeRSS Feed
HeatherNewton
Quartz | Level 8

Hi am i right to say this, when a sas merge is many to one or one to many, oracle sql full join works. If many to many, then no normal sql can do that. I will have to create own sql command. I am thinking if the best way is use full outer join then set conditions to remove excess observations.

Any insights.. my friend suggests oracle merge into might do the trick, any tried?

14 REPLIES 14
Tom
Super User Tom
Super User

provide a description or link to "merge into" from oracle.

Provide more details on the type of merge you are trying to replicate.

Are there any non-key common variables between the two datasets?

Do you already have an counter variable in each dataset that can be used to insure the records are match in the proper order?

Also how much does it matter whether or not the SAS merge is replicated 100%?  Perhaps there are aspects of the result that you are not using that mean you don't actually need to replicate those things.

SASKiwi
PROC Star

@HeatherNewton wrote:
Hi am i right to say this, when a sas merge is many to one or one to many, oracle sql full join works. If many to many, then no normal sql can do that. I will have to create own sql command. I am thinking if the best way is use full outer join then set conditions to remove excess observations.

Any insights.. my friend suggests oracle merge into might do the trick, any tried?

You keep searching for an SQL solution that will work exactly the same as a SAS MERGE - believe me there isn't any. MERGE does coalescing of columns with the same name from each table without explicit coding, in SQL you have to code it.

 

Also "normal" SQL can do many-to-many joins without a problem. SAS MERGE can't unless you get complicated and use other statements.

 

The sooner you stop thinking of SQL in terms of DATA step processing, the sooner you will start to make progress.

Kurt_Bremser
Super User

You keep asking the same thing over and over again, although you have gotten many answers from all the senior folks here. This is somehow disturbing.

 

Take a few (think: thousands!) steps back and look at the big picture:

  • what is the data I get initially?
  • what is the information needed from this data?

Once this process or these processes are sufficiently defined, you can go to work to implement them in Oracle.

Any undocumented process in SAS, where nobody can tell you what it does (from the business/analytics side, without inspecting the SAS code) is most probably not worth bothering about anyway.

HeatherNewton
Quartz | Level 8
Sorry guys! I was hoping writing out sas match merge logic is easier than understanding what program is actually doing to the data. Also want to have same output for Comparison with production data.

In my project, i have lots and lots of sas programs with match merge which i cant tell why merge is used. Eg in a credit card project, on a daily basis, we merge the card application info dataset (which is accumulated daily) with new daily application data.

Eg data application;
merge application dataA dataB;
by application_ref_no;
run;

Basically from physical data, they are many to many. What i dont understand is why merge is applicable as only last entry of dataA and dataB would be used for the same application_ref_no..

So i cant tell if using other join could work and if i used full join i will have much bigger output.
Tom
Super User Tom
Super User

You have THREE datasets that each have multiple observations for the same APPLICATION number?

What would that even mean?

 

What do you mean by LAST one?

SQL does not have a concept of LAST.  If you have an ordering variable you can mimic it by selecting the MAX.

Is there some other variable (like a date?) that defines the order?

 

If you only want the last one then subset each input dateset to the last observation per APPLICATION and then it is a simple one-to-one (to-one) join.

HeatherNewton
Quartz | Level 8

I dont mean I want the last one only but for sas match merge if I have eg

data A     data B

1 a          1 d

1 b          1 e

1 c

 

result would be

 

1 a d

1 b e

1 c e

 

I mean the one extra on data A would merge to the last one in data B which is 1 e

 

this is a very strange kind of merge and when is this ever be useful

 

Kurt_Bremser
Super User

This is what a data step MERGE does. Keep in mind that the purest form of a MERGE has no BY statement and just sets two datasets side by side. This is also why the MERGE is called that and not JOIN.

The inner simplicity of the data step is also the reason why it performs so well, but the SAS coder needs to be aware of every aspect of its working.

Tom
Super User Tom
Super User

That is just how MERGE works.  The reason E is "carried forward" is because there are no more observations to read that would change its value.  Remember this feature is WHY the one to many merge works.  The variables contributed by the one observation dataset are never changed until the next BY group is reached.

 

But the question was WHY would YOU do that with bank application data?  What is the code trying to do?

Perhaps the code did not expect there to be multiple observations for the same by group from more than one input dataset.  Perhaps you are trying to use the code on the wrong type of data.

 

 

Reeza
Super User

@HeatherNewton wrote:

 

 

this is a very strange kind of merge and when is this ever be useful

 


Bingo. You will rarely every see a many to many merge and the few I've come across are actually mistakes where the data ended up being many to many, after time when it originally was not and someone missing that. I would take a look at your use cases and see if you have any many to many joins in data step and I suspect you do not and thus this is a moot point and your SQL joins will mostly be fine. 

 

In my experience the issues with converting data step to SQL usually originate around first/last/by group processing and retain/lag logic that uses row order.

 

 

HeatherNewton
Quartz | Level 8

I have now looked into my datasets and realised the below points:

 

Normally the merge are done on of two datasets with the same 30 variables by 2 to 4 variables on a daily basis. Wdata.pil_master seems accumulative and _pil_master which is a delta file of each day. 

 

Data wdata.pil_master;
    merge wdata.pil_master _pil_master;
    by org_code account_no plan_num ref_no;
run;


so am i right to say the followings:

if the entries in _pil_master is not new, it will amend those in wdata.pil_master

if the entries in _pil_master is new, it will be added to wdata.pil_master

 

is my understanding of merge in this case, correct?

 

 

 

 

 

 

 

HeatherNewton
Quartz | Level 8

I have no log as I dont have real data to test yet

SASKiwi
PROC Star

You can answer your own questions by creating some test data and running your MERGE code on that.

mkeintz
PROC Star

MATCH MERGE crucially depends on observation order, which SQL (to my knowledge) wholly ignores.

 

Now, I think it is possible to replicate the match merge using the SQL code below.  But it must be preceded by a pair of data step, to put sequence numbers for every obs in BY group

 

/* Make datasets A and B such that:
    ID=1 is one      to one
    ID=2 is one      to many (2)
    ID=3 is many (2) to one
    ID=4 is none     to many (2)
    ID=5 is many (2) to none
    ID=6 is many (3) to many (3) with identical freqs (N(a)=N(b)
    ID=7 is many (3) to many (2) with N(a)>N(b)
    ID=8 is many (2) to many (3) with N(a)<N(b)  */
data a;
  input id atext $7.  @@;
datalines;
1 Line1.1
2 Line2.1             
3 Line3.1  3 Line3.2

5 Line5.1  5 Line5.2  
6 Line6.1  6 Line6.2  6 Line6.3
7 Line7.1  7 Line7.2  7 Line7.3
8 Line8.1  8 Line8.2
run; 

data b;
  input id btext $7.  @@;
datalines;
1 Line1.1  
2 Line2.1  2 Line2.2  
3 Line3.1
4 Line4.1  4 Line4.2

6 Line6.1  6 Line6.2  6 Line6.3
7 Line7.1  7 Line7.2
8 Line8.1  8 Line8.2  8 Line8.3
run;


data viewa /view=viewa;
  set a;
  by id;
  if first.id then _seq1=1;
  else _seq1+1;
run;

data viewb/view=viewb;
  set b;
  by id;
  if first.id then _seq2=1;
  else _seq2+1;
run;

proc sql;
  create table want_sql
  as select coalesce(a.id,b.id) as id, atext,btext, _seq1, _seq2
  from  viewa as a full join viewb as b
  on  (a.id=b.id)
  group by calculated id
  having a._seq1=b._seq2  
      or a._seq1>max(b._seq2) and _seq2=max(b._seq2)
      or b._seq2>max(a._seq1) and _seq1=max(a._seq1)
  ;
quit;

The strategy here is to do exact matches on the BY variable (ID) and the calculated _SEQ variable.  Then, for whichever data set has a higher frequency for the BY-group-in-hand, match it to the maximum _SEQ is the "smaller" dataset.

 

 I think that this is a way to show SQL programmers what a simple match-merge does.  

 

You can test the results of the above via:

 

proc sort data=want_sql;
  by id _seq1 _seq2;
run;

data want_merge;
  merge a b;
  by id;
run;

proc compare base=want_merge compare=want_sql (drop=_seq1 _seq2) notes noprint;
run;

The PROC COMPARE reports:

NOTE: No unequal values were found. All values compared are exactly equal.
NOTE: The data sets WORK.WANT_MERGE and WORK.WANT_SQL are equal in all respects.
NOTE: There were 18 observations read from the data set WORK.WANT_MERGE.
NOTE: There were 18 observations read from the data set WORK.WANT_SQL.

 

 

And while adding a third data set C to the match-merge approach is trivial

data want_merge;
  merge a b c;
  by id;
run;

 I do not want to think about replicating it in a single SELECT expression

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1243 views
  • 1 like
  • 6 in conversation