Solved: Re: Problems merging with if statement...

GKati · Posted 05-19-2017 03:54 PM

HI Everyone,

I have 2 datasets (clients and hospitals). The common variable between the two is municipality. Based on the municipality I want to randomly assign clients to hospitals. I know the ratio of the local population that goes to each hospital, that is the upper and lower variable.

In other words, I would like to merge observations where 1. the municipality matches 2. u random number falls between lower and upper bound.

My problem is that hospitals seem to be only assigned once, and I want them to be allocated to each observation based on the above rules. Sort of like a many-to-many merge...

What am I doing wrong?

data clients;

input client municipality;

datalines;

502 12

503 18

505 12

706 44

807 51

508 12

709 12

801 44

;

run;

proc sort data=clients out=clients;

BY client;

run;

Data clients;

set clients;

by client;

u = rand("Uniform");

run;

data hospitals;

input municipality hospital_id lower upper;

datalines;

12 2 0 0.3

12 3 0.3 1

18 7 0 0.1

18 10 0.1 1

51 134 0.5 1

51 99 0 0.2

51 123 0.2 1

44 123 0 1

;

run;

proc sort data=clients out=clients;

BY municipality;

run;

proc sort data=hospitals out=hospitals;

by municipality;

run;

DATA work.merge;

MERGE clients hospitals;

BY municipality;

if lower<u and u<upper;

RUN;

ballardw · Posted 05-19-2017 04:15 PM

You might show what you want for output.

You are doing what is called a many-to-many merge: many municipalities (from the client data) to many hospitals. The data step merge is seldom the appropriate tool for that type of merge. The good news is Proc Sql does so very well:

proc sql;
   create table want as
   select a.*, b.hospital_id 
   from clients as a left join hospitals as b
      on a.municipality=b.municipality
   where a.u between b.lower and b.upper
   ;
quit;

Also sql does not require the explicit sort beforehand.

View solution in original post

ballardw · Posted 05-19-2017 04:15 PM

You might show what you want for output.

You are doing what is called a many-to-many merge: many municipalities (from the client data) to many hospitals. The data step merge is seldom the appropriate tool for that type of merge. The good news is Proc Sql does so very well:

proc sql;
   create table want as
   select a.*, b.hospital_id 
   from clients as a left join hospitals as b
      on a.municipality=b.municipality
   where a.u between b.lower and b.upper
   ;
quit;

Also sql does not require the explicit sort beforehand.

art297 · Posted 05-19-2017 04:23 PM

I'd suggest code similar to what @ballardw suggested, but I'd use a left join so that you will still output clients who don't have a matching municipality. Also, in your specs you said that you didn't want to include the two cutoffs. If that is true, you can't use the BETWEEN operator:

data clients;
  input client municipality;
  datalines;
502 12   
503 18
505 12
706 44
807 51
508 12
709 12
801 44
;
run;
 
Data clients;
  set clients;
  u = rand("Uniform");
run;
 
data hospitals;
  input municipality hospital_id lower upper;
  datalines;
12 2 0 0.3
12 3 0.3 1
18 7 0 0.1
18 10 0.1 1
51 134 0.5 1
51 99 0 0.2
51 123 0.2 1
44 123 0 1
;
run;
 
proc sql noprint;
  create table work.merge as
    select a.*, b.hospital_id
      from clients a
        left join hospitals b
          on a.municipality=b.municipality
            where lower<u<upper
  ;
quit;

Art, CEO, AnalystFinder.com

ballardw · Posted 05-19-2017 04:48 PM

@art297 when I tested my code with between it seemed to work okay and did not require the bounds in the output. Could this be difference between SAS and some other SQL or maybe the older version of SAS (9.2) I have here?

art297 · Posted 05-19-2017 05:18 PM

@ballardw: @GKati's specs stated: if lower<u and u<upper;

I was simply pointing out that the between operator is the same as lower<=u<=upper.

I think that has been true ever since SAS incorporated the between operator

Art, CEO, AnalystFinder.com

ballardw · Posted 05-19-2017 05:43 PM

@art297 wrote:

@ballardw: @GKati's specs stated: if lower<u and u<upper;

I was simply pointing out that the between operator is the same as lower<=u<=upper.

I think that has been true ever since SAS incorporated the between operator

Art, CEO, AnalystFinder.com

Gotcha.

GKati · Posted 05-21-2017 09:10 AM

I tried both solutions and they both seem to work well. I accepted ballardw's solution just because he was first.

Thanks for your help.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away