03-15-2017 09:48 PM
I need help in joining three tables using proc SQL and then calculate percentages for some variables. Any help on this is greatly appreciated. My three tables look like:
input ID Region$ Name$ Tier$ product$;
101 Albany Chris A Furniture
102 Chicago Donald A Furniture
103 Trenton Sam B Furniture
104 Philadelphia Rose C Furniture
105 Bridgeport Mary D Furniture
101 Santafe Suzanne B Furniture
105 Utah King A Furniture
106 Syracuse Queen B Furniture
input ID Region$ Name$ Tier$ product$ calls call_type;
101 Albany Chris A Furniture 10 Detail
102 Chicago Donald A Furniture 12 Webex
103 Trenton Sam B Furniture 13 Detail
104 Philadelphia Rose C Furniture 13 Scheduled appointment
105 Bridgeport Mary D Furniture 12 Detail
101 Santafe Suzanne B Furniture 14 Webex
105 Utah King A Furniture 10 Webex
106 Syracuse Queen B Furniture 12 Scheduled appointment
input ID Region$ Name$ ;
101 Albany Chris
102 Chicago Donald
103 Trenton Sam
104 Philadelphia Rose
105 Bridgeport Mary
101 Santa Fe Suzanne
105 Utah King
106 Syracuse Queen
So table 1 have only product information, table 2 have a number of calls for each product and table3 have region IDs and the person's names.
I want to combine all the datasets using proc SQL and calculate % of calls for each tier (A, B, C only) and the % frequency of all the calls and also call types (Detail, Webex, scheduled appointment). I need the following columns in the final output:
These are just the rough numbers not the actual ones after calculation, this is just for sample output.
03-15-2017 10:12 PM
Since we probably don't want to do all of your work, what have you tried and where specifically are you running into problems.
03-16-2017 04:55 AM
A) Why are you fixed with SQL?
B) It's generally not a good idea to store percentages. Have this calculated in a report step instead, like in PROC TABULATE/REPORT. Or in a semantic layer, if you have one.
03-16-2017 01:32 PM
From your example I do not see Test3 contributing anything. So what would be the purpose of "merging" test 3?
Or did you miss stating the you want to select the records from test and test1 to only include matches for test3?
And similarly everything in test is also in test1. So why bother with test at all?
Also your data step for test1 does not work correctly because the variables product and call_type (which should be character and and not indicated as such) will have the length, not specified, default to 8 characters. Also since you are using list input call_type will only have the first word when you want "Scheduled appointment".
You should also specify exactly what the numerator and denominator is for each of those percentages as it makes little sense as it is not obvious.
AND your result seems to have changed the regions for some names. The data shows Donald in Chicago and Mary in Bridgeport.