Hi,
I need help in joining three tables using proc SQL and then calculate percentages for some variables. Any help on this is greatly appreciated. My three tables look like:
data test;
infile datalines;
input ID Region$ Name$ Tier$ product$;
cards;
101 Albany Chris A Furniture
102 Chicago Donald A Furniture
103 Trenton Sam B Furniture
104 Philadelphia Rose C Furniture
105 Bridgeport Mary D Furniture
101 Santafe Suzanne B Furniture
105 Utah King A Furniture
106 Syracuse Queen B Furniture
;
run;
Data test1;
infile datalines;
input ID Region$ Name$ Tier$ product$ calls call_type;
cards;
101 Albany Chris A Furniture 10 Detail
102 Chicago Donald A Furniture 12 Webex
103 Trenton Sam B Furniture 13 Detail
104 Philadelphia Rose C Furniture 13 Scheduled appointment
105 Bridgeport Mary D Furniture 12 Detail
101 Santafe Suzanne B Furniture 14 Webex
105 Utah King A Furniture 10 Webex
106 Syracuse Queen B Furniture 12 Scheduled appointment
;
run;
Data test3;
infile datalines;
input ID Region$ Name$ ;
cards;
101 Albany Chris
102 Chicago Donald
103 Trenton Sam
104 Philadelphia Rose
105 Bridgeport Mary
101 Santa Fe Suzanne
105 Utah King
106 Syracuse Queen
;
run;
So table 1 have only product information, table 2 have a number of calls for each product and table3 have region IDs and the person's names.
I want to combine all the datasets using proc SQL and calculate % of calls for each tier (A, B, C only) and the % frequency of all the calls and also call types (Detail, Webex, scheduled appointment). I need the following columns in the final output:
ID | Region | Name | %Calls | %frequency | Detail | Webex | Scheduled appointment |
101 | Albany | Chris | 67.0% | 25.0% | 10 | 10 | 12 |
102 | Trenton | Donald | 50.0% | 30.0% | 12 | 11 | 13 |
103 | Santafe | Mary | 45.0% | 35.0% | 13 | 12 | 11 |
These are just the rough numbers not the actual ones after calculation, this is just for sample output.
Thank you
M
Since we probably don't want to do all of your work, what have you tried and where specifically are you running into problems.
A) Why are you fixed with SQL?
B) It's generally not a good idea to store percentages. Have this calculated in a report step instead, like in PROC TABULATE/REPORT. Or in a semantic layer, if you have one.
From your example I do not see Test3 contributing anything. So what would be the purpose of "merging" test 3?
Or did you miss stating the you want to select the records from test and test1 to only include matches for test3?
And similarly everything in test is also in test1. So why bother with test at all?
Also your data step for test1 does not work correctly because the variables product and call_type (which should be character and and not indicated as such) will have the length, not specified, default to 8 characters. Also since you are using list input call_type will only have the first word when you want "Scheduled appointment".
You should also specify exactly what the numerator and denominator is for each of those percentages as it makes little sense as it is not obvious.
AND your result seems to have changed the regions for some names. The data shows Donald in Chicago and Mary in Bridgeport.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.