BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GKati
Pyrite | Level 9

HI Everyone, 

 

I have 2 datasets (clients and hospitals). The common variable between the two is municipality. Based on the municipality I want to randomly assign clients to hospitals. I know the ratio of the local population that goes to each hospital, that is the upper and lower variable.

 

In other words, I would like to merge observations where 1. the municipality matches 2. u random number falls between lower and upper bound.

 

My problem is that hospitals seem to be only assigned once, and I want them to be allocated to each observation based on the above rules. Sort of like a many-to-many merge... 

 

What am I doing wrong?

 

data clients;

input client municipality;

datalines;

502 12   

503 18

505 12

706 44

807 51

508 12

709 12

801 44

;

run;

 

proc sort data=clients out=clients;

BY client;

run;

 

Data clients;

set clients;

by client;

u = rand("Uniform");

run;

 

data hospitals;

input municipality hospital_id lower upper;

datalines;

12 2 0 0.3

12 3 0.3 1

18 7 0 0.1

18 10 0.1 1

51 134 0.5 1

51 99 0 0.2

51 123 0.2 1

44 123 0 1

;

run;

 

proc sort data=clients out=clients;

BY municipality;

run;

 

proc sort data=hospitals out=hospitals;

by municipality;

run;

 

DATA work.merge;

MERGE clients hospitals;

BY municipality;

if lower<u and u<upper;

RUN;

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

You might show what you want for output.

You are doing what is called a many-to-many merge: many municipalities (from the client data) to many hospitals. The data step merge is seldom the appropriate tool for that type of merge. The good news is Proc Sql does so very well:

 

proc sql;
   create table want as
   select a.*, b.hospital_id 
   from clients as a left join hospitals as b
      on a.municipality=b.municipality
   where a.u between b.lower and b.upper
   ;
quit;

Also sql does not require the explicit sort beforehand.

 

View solution in original post

6 REPLIES 6
ballardw
Super User

You might show what you want for output.

You are doing what is called a many-to-many merge: many municipalities (from the client data) to many hospitals. The data step merge is seldom the appropriate tool for that type of merge. The good news is Proc Sql does so very well:

 

proc sql;
   create table want as
   select a.*, b.hospital_id 
   from clients as a left join hospitals as b
      on a.municipality=b.municipality
   where a.u between b.lower and b.upper
   ;
quit;

Also sql does not require the explicit sort beforehand.

 

art297
Opal | Level 21

I'd suggest code similar to what @ballardw suggested, but I'd use a left join so that you will still output clients who don't have a matching municipality. Also, in your specs you said that you didn't want to include the two cutoffs. If that is true, you can't use the BETWEEN operator:

data clients;
  input client municipality;
  datalines;
502 12   
503 18
505 12
706 44
807 51
508 12
709 12
801 44
;
run;
 
Data clients;
  set clients;
  u = rand("Uniform");
run;
 
data hospitals;
  input municipality hospital_id lower upper;
  datalines;
12 2 0 0.3
12 3 0.3 1
18 7 0 0.1
18 10 0.1 1
51 134 0.5 1
51 99 0 0.2
51 123 0.2 1
44 123 0 1
;
run;
 
proc sql noprint;
  create table work.merge as
    select a.*, b.hospital_id
      from clients a
        left join hospitals b
          on a.municipality=b.municipality
            where lower<u<upper
  ;
quit;

Art, CEO, AnalystFinder.com

 

ballardw
Super User

@art297 when I tested my code with between it seemed to work okay and did not require the bounds in the output. Could this be difference between SAS and some other SQL or maybe the older version of SAS (9.2) I have here?

art297
Opal | Level 21

@ballardw@GKati's specs stated: if lower<u and u<upper;

I was simply pointing out that the between operator is the same as lower<=u<=upper.

 

I think that has been true ever since SAS incorporated the between operator

 

Art, CEO, AnalystFinder.com

 

ballardw
Super User

@art297 wrote:

@ballardw@GKati's specs stated: if lower<u and u<upper;

I was simply pointing out that the between operator is the same as lower<=u<=upper.

 

I think that has been true ever since SAS incorporated the between operator

 

Art, CEO, AnalystFinder.com

 


Gotcha.

GKati
Pyrite | Level 9

I tried both solutions and they both seem to work well. I accepted ballardw

 

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 7938 views
  • 2 likes
  • 3 in conversation