DATA Step, Macro, Functions and more

Can I use proc sql to dynamically merge two datasets?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 14
Accepted Solution

Can I use proc sql to dynamically merge two datasets?

Good afternoon and happy Friday, folks

I’m trying to automate a placement simulation of youth into residential treatment where they will have the highest likelihood of success. Success is operationalized as “not recidivating” within 3 years of entering treatment. Equations predicting recidivism have been generated for each location, and the equations have been applied to each individual in the scenario (based on youth characteristics like risk, age, etc., LOS). Each youth has predicted success rates for every location, which throws in a wrench: youth are not qualified for all of the treatment facilities for which they have predicted success rates. Indeed, treatment locations have differing, yet overlapping qualifications. Let’s take a made-up example. Johnny (ID # 5, below) is a 15-year-old boy with drug charges. He could have “predicted success rates” of 91% for location A, 88% for location B, 50% for location C, and 75% for location D. Johnny is most likely to be successful (i.e., not recidivate within three years of entering treatment) if he is treated at location A; unfortunately, location A only accepts youth who are 17 years old or older; therefore, Johnny would not qualify for treatment here. Alternatively, for Johnny, location B is the next best location. Let us assume that Johnny is qualified for location B, but that all of location-B beds are filled; so, we must now look to location D, as it is now Johnny’s “best available” option at 75%. The score so far: We are matching youth to available beds in location for which they qualify and might enjoy the greatest likelihood of success. Unfortunately, each location only has a certain number of available beds, and the number of available beds different across locations. The qualifications of entry into treatment facilities differ, yet overlap (e.g., 12-17 year-olds vs 14-20 year-olds). In order to simulate what placement decisions might look like based on success rates, I went through the scenario describe above for over 400 youth, by hand, in excel. It took me about a week. I’d like to use PROC SQL imbedded in a SAS MACRO to automate these placement scenarios with the ultimate goals of a) obtain the ability to bootstrap iterations in order to examine effect sizes across distributions, b) save time, and c) prevent further brain damage from banging my head again desk and wall in frustration whilst doing this by hand. Whilst never having had the necessity—nay—the privilege of using SQL in my typical roll as a researcher, I believe that this time has now come to pass and I’m excited about it! Honestly. I believe it has the capacity I’m looking for. Unfortunately, it is beating the devil out of me! Here’s what I’ve got cookin’ so far: I want to create and automate the placement simulation with the clever use of merging/joining/switching/or something like that. I have two datasets (tables). The first dataset contains all of the youth information (one row per youth; several columns with demographics, location ranks, which correspond to the predicted success rates). The order of rows in the youth dataset (was/will be randomly generated (to simulate the randomness with which youth enter the system and are subsequently place into treatment). Note that I will be “cleaning” the youth dataset prior to merging such that rank-column cells will only be populated for programs for which a respective youth qualifies. This should take the “does the youth even qualify for the program” problem out of the equation.

However, it still leaves the issue of availability left to be contended with in the scenario. The second dataset containing the treatment facility beds, with each row corresponding to an available bed in one of the treatment location; two columns contain bed numbers and location names. Each bed (row) has only one location cell populated, but locations will populate several cells.

Thus, in descending order, I want to merge each youth row with the available bed that represents his/her best chance of success, and so the merge/join/switch/thing should take place

on youth.Rank1= distinct TF.Location, 
and if youth.Rank1 TF.location then 
merge on youth.Rank2= TF.location, 
if youth.Rank2 TF.location then merge at 
youth.Rank3 = TF.location, etc.

Put plainly: “Merge on rank1 unless rank1 location is no longer available, then merge on rank2, unless rank2 location is no longer available, and on down the line, etc., etc., until all option are exhausted and foster care (i.e., alternative services). Is the only option.
I’ve had no success getting this to work. I haven’t even been successful getting the union function to work. About the only successful thing I’ve done in SQL so far is create a view of a single dataset. It’s pretty sad. I’ve been following this guidance, but I get hung up around the “where” command:

proc sql; /Calls the SQL procedure*/;
create table x as /*Tells SAS to create a table called x*/
select /*Specifies the column(s) to be selected*/
from /*Specificies the tables(s) (data sets) to be queried*/
where /*Subjests the data based on a condition*/
group by /*Classifies the data into groups based on the specified 
column(s)*/
order by /*Sorts the resulting rows observations) by the specified 
column(s)*/
; quit; /*Ends the proc sql procedure*/

Frankly, I’m stuck and I could use some advice. This greenhorn in me is in way over his head.

I appreciate any help or guidance anyone might lend.

Cheers!

P


Accepted Solutions
Solution
‎06-12-2017 03:47 PM
Super User
Posts: 10,550

Re: Can I use proc sql to dynamically merge two datasets?

[ Edited ]

In general SQL is not a good choice when order of operation is required. SQL is built around set operations (intersect union and such) not if first or second is not available use n'th choice.

 

Examining multiple rows and columns at the same time is more of a SAS/IML program

 

Your example output data has different ranks that input:

 

8484 13 f D C  

vs

8484 13 f B A C D   C 8

 

 

 

View solution in original post


All Replies
Occasional Contributor
Posts: 14

Can I use proc sql to dynamically merge two dataset in order to simulate placement decisions?

Good afternoon and happy Friday, folks

I’m trying to automate a placement simulation of youth into residential treatment where they will have the highest likelihood of success. Success is operationalized as “not recidivating” within 3 years of entering treatment. Equations predicting recidivism have been generated for each location, and the equations have been applied to each individual in the scenario (based on youth characteristics like risk, age, etc., LOS). Each youth has predicted success rates for every location, which throws in a wrench: youth are not qualified for all of the treatment facilities for which they have predicted success rates. Indeed, treatment locations have differing, yet overlapping qualifications. Let’s take a made-up example. Johnny (ID # 5, below) is a 15-year-old boy with drug charges. He could have “predicted success rates” of 91% for location A, 88% for location B, 50% for location C, and 75% for location D. Johnny is most likely to be successful (i.e., not recidivate within three years of entering treatment) if he is treated at location A; unfortunately, location A only accepts youth who are 17 years old or older; therefore, Johnny would not qualify for treatment here. Alternatively, for Johnny, location B is the next best location. Let us assume that Johnny is qualified for location B, but that all of location-B beds are filled; so, we must now look to location D, as it is now Johnny’s “best available” option at 75%. The score so far: We are matching youth to available beds in location for which they qualify and might enjoy the greatest likelihood of success. Unfortunately, each location only has a certain number of available beds, and the number of available beds different across locations. The qualifications of entry into treatment facilities differ, yet overlap (e.g., 12-17 year-olds vs 14-20 year-olds). In order to simulate what placement decisions might look like based on success rates, I went through the scenario describe above for over 400 youth, by hand, in excel. It took me about a week. I’d like to use PROC SQL imbedded in a SAS MACRO to automate these placement scenarios with the ultimate goals of a) obtain the ability to bootstrap iterations in order to examine effect sizes across distributions, b) save time, and c) prevent further brain damage from banging my head again desk and wall in frustration whilst doing this by hand. Whilst never having had the necessity—nay—the privilege of using SQL in my typical roll as a researcher, I believe that this time has now come to pass and I’m excited about it! Honestly. I believe it has the capacity I’m looking for. Unfortunately, it is beating the devil out of me! Here’s what I’ve got cookin’ so far: I want to create and automate the placement simulation with the clever use of merging/joining/switching/or something like that. I have two datasets (tables). The first dataset contains all of the youth information (one row per youth; several columns with demographics, location ranks, which correspond to the predicted success rates). The order of rows in the youth dataset (was/will be randomly generated (to simulate the randomness with which youth enter the system and are subsequently place into treatment). Note that I will be “cleaning” the youth dataset prior to merging such that rank-column cells will only be populated for programs for which a respective youth qualifies. This should take the “does the youth even qualify for the program” problem out of the equation.

However, it still leaves the issue of availability left to be contended with in the scenario. The second dataset containing the treatment facility beds, with each row corresponding to an available bed in one of the treatment location; two columns contain bed numbers and location names. Each bed (row) has only one location cell populated, but locations will populate several cells.

Thus, in descending order, I want to merge each youth row with the available bed that represents his/her best chance of success, and so the merge/join/switch/thing should take place

on youth.Rank1= distinct TF.Location, 
and if youth.Rank1 TF.location then 
merge on youth.Rank2= TF.location, 
if youth.Rank2 TF.location then merge at 
youth.Rank3 = TF.location, etc.

Put plainly: “Merge on rank1 unless rank1 location is no longer available, then merge on rank2, unless rank2 location is no longer available, and on down the line, etc., etc., until all option are exhausted and foster care (i.e., alternative services). Is the only option.
I’ve had no success getting this to work. I haven’t even been successful getting the union function to work. About the only successful thing I’ve done in SQL so far is create a view of a single dataset. It’s pretty sad. I’ve been following this guidance, but I get hung up around the “where” command:

proc sql; /Calls the SQL procedure*/;
create table x as /*Tells SAS to create a table called x*/
select /*Specifies the column(s) to be selected*/
from /*Specificies the tables(s) (data sets) to be queried*/
where /*Subjests the data based on a condition*/
group by /*Classifies the data into groups based on the specified 
column(s)*/
order by /*Sorts the resulting rows observations) by the specified 
column(s)*/
; quit; /*Ends the proc sql procedure*/

Frankly, I’m stuck and I could use some advice. This greenhorn in me is in way over his head.

I appreciate any help or guidance anyone might lend.

Cheers!

P

PROC Star
Posts: 264

Re: Can I use proc sql to dynamically merge two dataset in order to simulate placement decisions?

if you can give small amount of data of what you want and have then someone can easily help you

Super User
Posts: 10,550

Re: Can I use proc sql to dynamically merge two dataset in order to simulate placement decisions?

Repeating the exact same message does not add any information needed to provide solutions.

 

I suggest also paying a little attention to text formatting. It is hard to tell what is imporant or where a "rule" starts (or ends).

 

Frankly we do not need to know what any variables represent in most cases but what the input data sets look like and what the desired output for that given example input should be is crucial.

 

And one should not start using SAS Macros if you do not have basic code to accomplish the task first.

Occasional Contributor
Posts: 14

Re: Can I use proc sql to dynamically merge two dataset in order to simulate placement decisions?

Didn't mean to do that Smiley Happy.

 

Again, it is the sql commands that I'm trying to learn. I've been quite successful using other "procs" in macros.

Respected Advisor
Posts: 3,124

Re: Can I use proc sql to dynamically merge two datasets?

it is really hard to envision any pratical approaches before you show some sample data, both HAVE and WANT.

For example, when you say 'not available' (which is a key concept in your description), which of the following do you really mean in data language:

1. The field is missing.

2. The field is not missing, but the value is not matching.

3. Others?

 

If it is No1. SAS has a straightforward function coalesce() worth looking up. Could be as simple as:


proc sql;
create table want as
select * from 
h1 
left join 
h2
on h1.location=coalesce(h2.rank1,h2.rank2, h3.rank3...down then line etc)
;
run;

 

If it is No2 or 3, then it may involve some more complex coding. However, without seeing your data structure, this is all guessing.

Occasional Contributor
Posts: 14

Re: Can I use proc sql to dynamically merge two datasets?

 Really trying to do that:

 

Youth dataset

 

Y_ID age sex rank1 rank2 rank3 rank4 alternative services
122 12 m B C      
15 14 m B C      
5 15 m B D C    
666 16 f C D      
561 16 m D B C    
8 21 m A D C B  
46 14 m D C B    
5555 17 m A B D C  
8484 13 f D C      
48 5 m C        

 

 

Location Dataset:

 

Bed_ID Location
1 A
2 B
3 C
4 D
5 D
6 D
7 A
8 C
9 B
10 D

 

proposed merge would look like:

 

Y_ID age sex rank1 rank2 rank3 rank4 alternative services location Bed_ID
122 12 m B C       B 2
15 14 m B C       B 9
5 15 m B D C     D 4
666 16 f C D       C 3
561 16 m D B C     D 5
8 21 m A D C B   A 1
46 14 m D C B     D 6
5555 17 m A B D C   A 7
8484 13 f B A C D   C 8
48 5 m C B A D   D 10
Super User
Posts: 10,550

Re: Can I use proc sql to dynamically merge two datasets?

So what is the rule that assigns a location and Bed_id pair to a specific Y_ID value?

Occasional Contributor
Posts: 14

Re: Can I use proc sql to dynamically merge two datasets?

Rank1 then rank2 then rank3. etc.

 

I want the location  name in rank1 to pair with a location  name in the location column. If the location name in rank1 cannot be found in the location column (because it has been previously assigned), then I want it to pair based on rank2 and on down the line. Again, I don't even know if this is possible....

Solution
‎06-12-2017 03:47 PM
Super User
Posts: 10,550

Re: Can I use proc sql to dynamically merge two datasets?

[ Edited ]

In general SQL is not a good choice when order of operation is required. SQL is built around set operations (intersect union and such) not if first or second is not available use n'th choice.

 

Examining multiple rows and columns at the same time is more of a SAS/IML program

 

Your example output data has different ranks that input:

 

8484 13 f D C  

vs

8484 13 f B A C D   C 8

 

 

 

Occasional Contributor
Posts: 14

Re: Can I use proc sql to dynamically merge two datasets?

Oops. This is not the actual data I'm using. Just for illustrative purposes. I can't post my actual data. It is sensitive.
Occasional Contributor
Posts: 14

Re: Can I use proc sql to dynamically merge two datasets?

I appreciate your expertise! Thank you. I'll give both of these a shot and see if I can't figure it out. Would it be possible to "pick your brain a little" offline, so to speak? About the solutions y'all provideed? Email of course? Cheers! P

Occasional Contributor
Posts: 14

Re: Can I use proc sql to dynamically merge two datasets?

I'm not sure why, but the actual syntax solution that was in the email I recieved does not appear in the message-board thingy. Any road, the solution that you provided worked brilliantly! I was able to modify it such that it worked on a different dataset. So, again, thank you! Well done! This is brilliant! I can't thank you enough. If you can get to Oregon, you are invited to the whole hog BBQ I'm hosting in the backyard on the 24th! It would be a pleasure.  Cheers! P

 

PS> I'm sure I'm going to run into further question,  so thank you in advance for your patience.

 

 

Occasional Contributor
Posts: 14

Re: Can I use proc sql to dynamically merge two datasets?

Would it be too bother some to highlight specific areas of the syntax, so I gain a deeper understanding of the process you've created here?
Respected Advisor
Posts: 3,124

Re: Can I use proc sql to dynamically merge two datasets?

[ Edited ]

This is a not bullet-proof approach, I need to leave office now so I may get back to you if this code needs patch. Anyway, hope this can get you started;

 

data youth;
	infile cards truncover;
	input (Y_ID age sex rank1 rank2 rank3 rank4) (:$8.);
	cards;
122 12 m B C       
15 14 m B C       
5 15 m B D C     
666 16 f C D       
561 16 m D B C     
8 21 m A D C B   
46 14 m D C B     
5555 17 m A B D C   
8484 13 f D C       
48 5 m C         
;

data location;
	input Bed_ID $ Location $;
	cards;
1 A 
2 B 
3 C 
4 D 
5 D 
6 D 
7 A 
8 C 
9 B 
10 D 
;

data want;
	if _n_=1 then
		do;
			if 0 then
				set location;
			declare hash h(dataset:'location', multidata:'y' );
			h.definekey('location');
	h.definedata(all:
			'y');
			h.definedone();
		end;

	set youth;
	array rank rank:;
	do over rank;
		if h.find(key:rank) = 0 then
			do;
				output;
				rc=h.removedup(key:rank);
				leave;
			end;
	end;

	drop rc;
run;

 btw, I doubt SQL will support the feature you are looking for, even it could, it can be very complex.  

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 215 views
  • 2 likes
  • 4 in conversation