BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shounster
Obsidian | Level 7

Going to give this a shot without all of the pesky code I have written that doesn't work.  😉

 

I have one table (master) in which two different summary tables are created (hospital and mixed).  After the summaries are created, I would like to merge the two summaries (hospital and mixed) back together on a series linked variables to a table (final)

 

To this point, PROC SQL and Group By have been used to create the summaries as well as PROC SQL to join the summaries together.  The service_area field below is based on the hospital's zip code being within a designated service area.  Essentially created by using a format to map a zip to the service area name. Pretty straightforward.

 

Hospital Table (summarized):

data hospital_sum;

input service_area $ drg $ payor_group $ age_group $ market_in_mig client_in_mkt;

datalines;

Toledo 935 govt age0-5 1 1

Toledo 935 govt age18-24 1 0

Toledo 935 govt age25-34 1 0

Toledo 935 govt age35-44 0 1

Toledo 935 comm age18-24 1 0

Toledo 935 comm age24-34 1 0

Toledo 935 comm age55-64 1 0

Toledo 935 comm age65-74 1 0

Toledo 935 caid age0-5 1 2

Toledo 935 caid age35-44 0 2

Toledo 935 caid age45-54 0 1

Toledo 935 care age35-44 0 1

Toledo 935 care age45-54 0 1

Toledo 935 care age55-64 1 1

Toledo 935 care age75-84 0 3

Toledo 935 self_pay age45-54 0 1

;

run;

 

The "service_area" in this table is created by determining whether or not a patient that visited the hospital is from the hospital's service area.  If the patient was from the hospital service area, then the "service_area" is the same as above.  If the patient is not within the hospital service area, this table's "service_area" contains the service area of the patient (in migration from the hospital's perspective)

 

Mixed Table (summarized):

data mixed_sum;

input service_area $ drg $ payor_group $ age_group $ market_total

datalines;

Lima 935 govt age18-24 1

Lima 935 govt age25-34 1

Lima 935 govt age55-64 1

Lima 935 caid age0-5 1

Toledo 935 comm age0-5 1

Toledo 935 comm age18-24 0

Toledo 935 comm age25-34 0

Toledo 935 comm age35-44 1

Toledo 935 govt age64-74 0

Toledo 935 caid age0-5 2

Toledo 935 caid age35-44 2

Toledo 935 caid age45-54 1

Toledo 935 care age35-44 1

Toledo 935 care age45-54 1

Toledo 935 care age55-64 1

Toledo 935 care age75-84 3

Toledo 935 self_pay age45-54 1

;

run;

 

When you merge these two files by service_area, drg, payor_group, and age_group, you will see that the market_in_mig loses records.  It goes from 9 to 6.  I understand the matching to these levels is going to drop data in proc sql because that particular match doesn't exist.  If you remove age_group, the fields roll up nicely.  That extra level is messing me up.  The other obvious issue is that the service_areas between the two tables are not truly the same (apples-to-apples) in the way they were created.

 

Am I trying to join/merge tables that can't be based on the differing service_area definitions OR can I force all of the age_groups to appear (there are 10 total) in both tables and successfully merge the tables that way?  If so, how do I keep all of the age groups using PROC SQL?

 

I think they are looking at two differing results but the client insists it can be on 'one page'.  I'm struggling with PROC SQL (left join, full, full outer, etc) versus MERGE as being the way to bring these together.

 

Any ideas at this point will be grateful.  I can provide more information if necessary.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Is the final result to be a data set or a report that people read?

One of the differences is that some of the reporting procedures can use a complete format option such as PRELOADFMT to require all of the formatted levels of a variable to be reported even though they are not in the data.

You mention 10 age groups but your data only provides examples for 7 so it would be very hard to provide a complete example.

data work.hospital_sum;
input service_area $ drg $ payor_group $ age_group $ market_in_mig client_in_mkt;
datalines;
Toledo 935 govt age0-5 1 1
Toledo 935 govt age18-24 1 0
Toledo 935 govt age25-34 1 0
Toledo 935 govt age35-44 0 1
Toledo 935 comm age18-24 1 0
Toledo 935 comm age24-34 1 0
Toledo 935 comm age55-64 1 0
Toledo 935 comm age65-74 1 0
Toledo 935 caid age0-5 1 2
Toledo 935 caid age35-44 0 2
Toledo 935 caid age45-54 0 1
Toledo 935 care age35-44 0 1
Toledo 935 care age45-54 0 1
Toledo 935 care age55-64 1 1
Toledo 935 care age75-84 0 3
Toledo 935 self_pay age45-54 0 1
;
run;
data work.mixed_sum;
input service_area $ drg $ payor_group $ age_group $ market_total;
datalines;
Lima 935 govt age18-24 1
Lima 935 govt age25-34 1
Lima 935 govt age55-64 1
Lima 935 caid age0-5 1
Toledo 935 comm age0-5 1
Toledo 935 comm age18-24 0
Toledo 935 comm age25-34 0
Toledo 935 comm age35-44 1
Toledo 935 govt age64-74 0
Toledo 935 caid age0-5 2
Toledo 935 caid age35-44 2
Toledo 935 caid age45-54 1
Toledo 935 care age35-44 1
Toledo 935 care age45-54 1
Toledo 935 care age55-64 1
Toledo 935 care age75-84 3
Toledo 935 self_pay age45-54 1
;
run;

proc sort data=work.hospital_sum ;
   by service_area drg payor_group age_group;
run;
proc sort data=work.mixed_sum ;
   by service_area drg payor_group age_group;
run;

data want;
   merge work.hospital_sum work.mixed_sum;
   by service_area drg payor_group age_group;
run;

proc format library=work;
value $agegroup
'age0-5'  ='age0-5'  
'age05-17'='age05-17'
'age18-24'='age18-24' 
'age25-34'='age25-34' 
'age35-44'='age35-44' 
'age45-54'='age45-54' 
'age55-64'='age55-64' 
'age75-84'='age75-84' 
;
run;

proc tabulate data=want;
   class service_area drg payor_group ;
   class age_group/preloadfmt ;
   format age_group $agegroup.;
   var market_in_mig client_in_mkt  market_total;
   table service_area* drg* payor_group*age_group,
         (market_in_mig client_in_mkt  market_total)*max=' '*f=best5.
         /printmiss misstext=' '
   ;
run;

The MISSTEXT option could also show 0 instead if  desired.

 

The issue of merging on not quite the same service area values is going to require a lot more knowledge about the data sources and possibly it might be better to go back and not use summarized data.

 

It helps to post code into a code box using the forum's {I} icon. The message windows here will reformat text and often affect the code in ways that data steps may not run. And sometimes it appears that "invisible" html codes get inserted generating very hard to  diagnose errors. The code box treats code as plain text and is much easier to use.

View solution in original post

4 REPLIES 4
mbuchecker
Quartz | Level 8

It's helpful if you would include your desired output.

Michelle
shounster
Obsidian | Level 7
I do apologize about the lack of output. It's been a bit of a conundrum since there are two service areas at play and trying to determine if I need to use both versus one versus some other way of grouping.
ballardw
Super User

Is the final result to be a data set or a report that people read?

One of the differences is that some of the reporting procedures can use a complete format option such as PRELOADFMT to require all of the formatted levels of a variable to be reported even though they are not in the data.

You mention 10 age groups but your data only provides examples for 7 so it would be very hard to provide a complete example.

data work.hospital_sum;
input service_area $ drg $ payor_group $ age_group $ market_in_mig client_in_mkt;
datalines;
Toledo 935 govt age0-5 1 1
Toledo 935 govt age18-24 1 0
Toledo 935 govt age25-34 1 0
Toledo 935 govt age35-44 0 1
Toledo 935 comm age18-24 1 0
Toledo 935 comm age24-34 1 0
Toledo 935 comm age55-64 1 0
Toledo 935 comm age65-74 1 0
Toledo 935 caid age0-5 1 2
Toledo 935 caid age35-44 0 2
Toledo 935 caid age45-54 0 1
Toledo 935 care age35-44 0 1
Toledo 935 care age45-54 0 1
Toledo 935 care age55-64 1 1
Toledo 935 care age75-84 0 3
Toledo 935 self_pay age45-54 0 1
;
run;
data work.mixed_sum;
input service_area $ drg $ payor_group $ age_group $ market_total;
datalines;
Lima 935 govt age18-24 1
Lima 935 govt age25-34 1
Lima 935 govt age55-64 1
Lima 935 caid age0-5 1
Toledo 935 comm age0-5 1
Toledo 935 comm age18-24 0
Toledo 935 comm age25-34 0
Toledo 935 comm age35-44 1
Toledo 935 govt age64-74 0
Toledo 935 caid age0-5 2
Toledo 935 caid age35-44 2
Toledo 935 caid age45-54 1
Toledo 935 care age35-44 1
Toledo 935 care age45-54 1
Toledo 935 care age55-64 1
Toledo 935 care age75-84 3
Toledo 935 self_pay age45-54 1
;
run;

proc sort data=work.hospital_sum ;
   by service_area drg payor_group age_group;
run;
proc sort data=work.mixed_sum ;
   by service_area drg payor_group age_group;
run;

data want;
   merge work.hospital_sum work.mixed_sum;
   by service_area drg payor_group age_group;
run;

proc format library=work;
value $agegroup
'age0-5'  ='age0-5'  
'age05-17'='age05-17'
'age18-24'='age18-24' 
'age25-34'='age25-34' 
'age35-44'='age35-44' 
'age45-54'='age45-54' 
'age55-64'='age55-64' 
'age75-84'='age75-84' 
;
run;

proc tabulate data=want;
   class service_area drg payor_group ;
   class age_group/preloadfmt ;
   format age_group $agegroup.;
   var market_in_mig client_in_mkt  market_total;
   table service_area* drg* payor_group*age_group,
         (market_in_mig client_in_mkt  market_total)*max=' '*f=best5.
         /printmiss misstext=' '
   ;
run;

The MISSTEXT option could also show 0 instead if  desired.

 

The issue of merging on not quite the same service area values is going to require a lot more knowledge about the data sources and possibly it might be better to go back and not use summarized data.

 

It helps to post code into a code box using the forum's {I} icon. The message windows here will reformat text and often affect the code in ways that data steps may not run. And sometimes it appears that "invisible" html codes get inserted generating very hard to  diagnose errors. The code box treats code as plain text and is much easier to use.

shounster
Obsidian | Level 7
Thank you very much for taking the time to review the code with little data or expected outcome. Part of the issue here has to do with the requested service areas. Since there is one that is related to the facility and one that is a hybrid of hospital and patient origin, it makes it a bit of a challenge to create output that can be utilized in Tableau.

Thank you for the reminder regarding the {I} icon. I completely whiffed on that and I apologize.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 976 views
  • 0 likes
  • 3 in conversation