HI Everyone,
I have 2 datasets (clients and hospitals). The common variable between the two is municipality. Based on the municipality I want to randomly assign clients to hospitals. I know the ratio of the local population that goes to each hospital, that is the upper and lower variable.
In other words, I would like to merge observations where 1. the municipality matches 2. u random number falls between lower and upper bound.
My problem is that hospitals seem to be only assigned once, and I want them to be allocated to each observation based on the above rules. Sort of like a many-to-many merge...
What am I doing wrong?
data clients;
input client municipality;
datalines;
502 12
503 18
505 12
706 44
807 51
508 12
709 12
801 44
;
run;
proc sort data=clients out=clients;
BY client;
run;
Data clients;
set clients;
by client;
u = rand("Uniform");
run;
data hospitals;
input municipality hospital_id lower upper;
datalines;
12 2 0 0.3
12 3 0.3 1
18 7 0 0.1
18 10 0.1 1
51 134 0.5 1
51 99 0 0.2
51 123 0.2 1
44 123 0 1
;
run;
proc sort data=clients out=clients;
BY municipality;
run;
proc sort data=hospitals out=hospitals;
by municipality;
run;
DATA work.merge;
MERGE clients hospitals;
BY municipality;
if lower<u and u<upper;
RUN;
You might show what you want for output.
You are doing what is called a many-to-many merge: many municipalities (from the client data) to many hospitals. The data step merge is seldom the appropriate tool for that type of merge. The good news is Proc Sql does so very well:
proc sql; create table want as select a.*, b.hospital_id from clients as a left join hospitals as b on a.municipality=b.municipality where a.u between b.lower and b.upper ; quit;
Also sql does not require the explicit sort beforehand.
You might show what you want for output.
You are doing what is called a many-to-many merge: many municipalities (from the client data) to many hospitals. The data step merge is seldom the appropriate tool for that type of merge. The good news is Proc Sql does so very well:
proc sql; create table want as select a.*, b.hospital_id from clients as a left join hospitals as b on a.municipality=b.municipality where a.u between b.lower and b.upper ; quit;
Also sql does not require the explicit sort beforehand.
I'd suggest code similar to what @ballardw suggested, but I'd use a left join so that you will still output clients who don't have a matching municipality. Also, in your specs you said that you didn't want to include the two cutoffs. If that is true, you can't use the BETWEEN operator:
data clients; input client municipality; datalines; 502 12 503 18 505 12 706 44 807 51 508 12 709 12 801 44 ; run; Data clients; set clients; u = rand("Uniform"); run; data hospitals; input municipality hospital_id lower upper; datalines; 12 2 0 0.3 12 3 0.3 1 18 7 0 0.1 18 10 0.1 1 51 134 0.5 1 51 99 0 0.2 51 123 0.2 1 44 123 0 1 ; run; proc sql noprint; create table work.merge as select a.*, b.hospital_id from clients a left join hospitals b on a.municipality=b.municipality where lower<u<upper ; quit;
Art, CEO, AnalystFinder.com
@art297 when I tested my code with between it seemed to work okay and did not require the bounds in the output. Could this be difference between SAS and some other SQL or maybe the older version of SAS (9.2) I have here?
I tried both solutions and they both seem to work well. I accepted ballardw's solution just because he was first.
Thanks for your help.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.