BookmarkSubscribeRSS Feed
SannaSanna
Quartz | Level 8

Hello!! 

 

I have two large tables I am trying to merge. Table 1 is called services and table 2 is customers.  I used the data step merge below and it appears that i am missing some records.  There can be multiple customers tied to the same service using the 'merge by unitA' but it seems that the code is picking and merging only on the first instance and missing subsequent customer records which should also be matched too but is missing- this is technically working but I am missing customer records.  Can this be correct? 

DATA YFYS YFNS NFYS;
MERGE Service1(in=a) customer1(in=b); by unitA;
IF A AND B THEN OUTPUT YFYS;
IF A=1 AND B=0 THEN OUTPUT YFNS;
IF A=0 AND B=1 THEN OUTPUT NFYS; RUN;

 

 

 

 

7 REPLIES 7
ballardw
Super User

Example data of both input and the output.

 

If you multiple records with the same value of UnitA in both data sets then likely the data step merge is not what you want as it was not designed for a many-to-many merge.

SannaSanna
Quartz | Level 8
I think thats it. its a one to many merge. What do you suggest as the best way to accomplish this?
mkeintz
PROC Star

@SannaSanna wrote:
I think thats it. its a one to many merge. What do you suggest as the best way to accomplish this?

If it really is a many-to-one merge (which way? many customers to a service -- or vice versa?), then the program you showed will accomplish what you apparently want.  You describe the problem as "I am missing some records".  That's not enough to make a diagnosis, let alone a prescription.

 

Please show a sample of the data that become missing after running the merge statement.   That's what @ballardw was asking for.

 

Help us help you.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
SannaSanna
Quartz | Level 8

Hi.  

 

Below is a sample of the data.  I am merging by 'Rx'.  My output table (want) is missing the second row for jen. 

 

customer table    
customer table race Rx  
jen w b12  
mik h xyy  
rob b d20  
       
service table    
Rx ph    
b12 wit    
b12 wan    
o18 chi    
xyy gon    
d20 rie    
       
Want      
jen w b12 wit
jen w b12 wan
mik h xyy gon
rob b d20 rie
Tom
Super User Tom
Super User

That does not look like a one to one or one to many relationship.  Multiple people could have the same RX.  And at least some of the RX have multiple PH values (whatever that is).  Use SQL to combine them, it will perform a many to many join.

 

Also learn how to share your data as simple data steps:

data customer;
  input id $ race $ Rx $;
cards;
jen w b12  
mik h xyy  
rob b d20  
;

data service;
  input Rx $ ph $;
cards; 
b12 wit    
b12 wan    
o18 chi    
xyy gon    
d20 rie
;

data want;
  input id $ race $ rx $ ph $;
cards;
jen w b12 wit
jen w b12 wan
mik h xyy gon
rob b d20 rie
;

Here is code to combine CUSTOMER and SERVICE based on match of RX.  I assumed you want all customers even if they do not have any matching service records so used a LEFT JOIN.

proc sql;
create table try as
  select a.id,a.race,a.rx,b.ph
  from customer a
  left join service b
    on a.rx = b.rx 
  order by 1,2,3 
;
quit;

Results:

Obs    id     race    Rx     ph

 1     jen     w      b12    wit
 2     jen     w      b12    wan
 3     mik     h      xyy    gon
 4     rob     b      d20    rie
SannaSanna
Quartz | Level 8
Hi. Thank you for your help! My datasets are extremely large and having difficulty with resources and I just provided a brief example of the data. Perhaps the set of data I provided was not a good example, but I have also used Proc SQL to tie the tables together and still missing records. The merge is technically correct using the parameters to merge, however, there are missing join records. Is it true that the SAS PDV runs through the data once and if successful moves on to the next record to merge? What happens if one of my tables have some exact same row records (customer & lots of demographics) and the other table is a table of providers. After merging by 'Rx' (providers Wit & Wan) serve the same customers and possibly on same day. When merging (sorted by provider) it seems the merge is successful for the first sorted provider (Wan) but all those customer records did not merge with the second provider(Wit). Is it that the SAS PDV moves on/finishes after initial succcessful merge even though there are second possible successful merge? I'm stumped.
mkeintz
PROC Star

@SannaSanna 

 

Let's discuss the first problem you identify:

 


@SannaSanna wrote:
... stuff deleted ...
but I have also used Proc SQL to tie the tables together and still missing records.

"Missing records" is way too amibiguous to diagnose.  But from what you have shown us, any missing records are not due to size of the data sets.  Please provide an explicit example of a record you expected in the result, but was missing.  And provide the initial records from which you expected it to be produced. 

 

You should be able to extract those records into a sample, and then quickly run both your SQL code and DATA step merge code that resulted in missing records. 

 

Show us those codes, and the log file that went with them.   Then diagnosis (and maybe prescription) is possible.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1032 views
  • 1 like
  • 4 in conversation