Hello!!
I have two large tables I am trying to merge. Table 1 is called services and table 2 is customers. I used the data step merge below and it appears that i am missing some records. There can be multiple customers tied to the same service using the 'merge by unitA' but it seems that the code is picking and merging only on the first instance and missing subsequent customer records which should also be matched too but is missing- this is technically working but I am missing customer records. Can this be correct?
:
DATA YFYS YFNS NFYS;
MERGE Service1(in=a) customer1(in=b); by unitA;
IF A AND B THEN OUTPUT YFYS;
IF A=1 AND B=0 THEN OUTPUT YFNS;
IF A=0 AND B=1 THEN OUTPUT NFYS; RUN;
Example data of both input and the output.
If you multiple records with the same value of UnitA in both data sets then likely the data step merge is not what you want as it was not designed for a many-to-many merge.
@SannaSanna wrote:
I think thats it. its a one to many merge. What do you suggest as the best way to accomplish this?
If it really is a many-to-one merge (which way? many customers to a service -- or vice versa?), then the program you showed will accomplish what you apparently want. You describe the problem as "I am missing some records". That's not enough to make a diagnosis, let alone a prescription.
Please show a sample of the data that become missing after running the merge statement. That's what @ballardw was asking for.
Help us help you.
Hi.
Below is a sample of the data. I am merging by 'Rx'. My output table (want) is missing the second row for jen.
customer table | |||
customer table | race | Rx | |
jen | w | b12 | |
mik | h | xyy | |
rob | b | d20 | |
service table | |||
Rx | ph | ||
b12 | wit | ||
b12 | wan | ||
o18 | chi | ||
xyy | gon | ||
d20 | rie | ||
Want | |||
jen | w | b12 | wit |
jen | w | b12 | wan |
mik | h | xyy | gon |
rob | b | d20 | rie |
That does not look like a one to one or one to many relationship. Multiple people could have the same RX. And at least some of the RX have multiple PH values (whatever that is). Use SQL to combine them, it will perform a many to many join.
Also learn how to share your data as simple data steps:
data customer;
input id $ race $ Rx $;
cards;
jen w b12
mik h xyy
rob b d20
;
data service;
input Rx $ ph $;
cards;
b12 wit
b12 wan
o18 chi
xyy gon
d20 rie
;
data want;
input id $ race $ rx $ ph $;
cards;
jen w b12 wit
jen w b12 wan
mik h xyy gon
rob b d20 rie
;
Here is code to combine CUSTOMER and SERVICE based on match of RX. I assumed you want all customers even if they do not have any matching service records so used a LEFT JOIN.
proc sql;
create table try as
select a.id,a.race,a.rx,b.ph
from customer a
left join service b
on a.rx = b.rx
order by 1,2,3
;
quit;
Results:
Obs id race Rx ph 1 jen w b12 wit 2 jen w b12 wan 3 mik h xyy gon 4 rob b d20 rie
Let's discuss the first problem you identify:
@SannaSanna wrote:
... stuff deleted ...
but I have also used Proc SQL to tie the tables together and still missing records.
"Missing records" is way too amibiguous to diagnose. But from what you have shown us, any missing records are not due to size of the data sets. Please provide an explicit example of a record you expected in the result, but was missing. And provide the initial records from which you expected it to be produced.
You should be able to extract those records into a sample, and then quickly run both your SQL code and DATA step merge code that resulted in missing records.
Show us those codes, and the log file that went with them. Then diagnosis (and maybe prescription) is possible.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.