I have a large data set that has multiple tiers to a C_ID. The C_ID serves as the foundation of the detail, from there a P_ID is associated and finally a segment_id. I'm trying to find the weighted average PART_C_D_PREMIUM of each P_ID that is associated to the parent C_ID by rolling up the segment_id's premiums.
I need to calculate weights based on frequency or number of segment_id's associated to each p_id and c_id. Not sure if a weighted mean is possible using this methodology, or if anyone has a better approach. If its not possible, what is the best way to find the mean premium and rolling up the segment_id's so that its just c_id and p_id with the corresponding means for each premium.
Example below of what I want my output to look like
C_ID | P_ID | parentorg | plan_type | plan_name | PART_C_D_Premium | Mean |
H05 | 067 | A | HM | BC | 0 | 0 |
H15 | 006 | A | LP | AM | 47,44,47 | 46 |
Here is my data set
data have;
infile datalines dlm="09"x;
input C_ID $ P_ID $ parentorg $ plan_type $ plan_name $ segment_id p_c_premium p_d_premium part_c_d_prem_combine;
datalines;
H05 067 A HM BC 1 0 0 0
H05 067 A HM BC 2 0 0 0
H10 237 H HM HG 1 0 0 0
H10 237 H HM HG 2 0 0 0
H15 006 A LP AM 1 0 47 47
H15 006 A LP AM 2 0 44 44
H15 006 A LP AM 4 0 47 47
H19 014 P HM PH 1 0 0 0
H19 014 P HM PH 2 25 0 25
H20 099 H HM HG 1 0 0 0
H20 099 H HM HG 2 0.6 22.4 23
H20 100 H HM HG 4 0 0 0
H20 100 H HM HG 5 19 0 19
H39 059 G LP GG 3 36.4 33.6 70
H39 059 G LP GG 4 35.9 39.1 75
H39 060 G LP GG 3 0 0 0
H39 060 G LP GG 4 0 0 0
H39 061 G LP GG 1 122.8 56.2 179
H39 061 G LP GG 2 102.8 56.2 159
H39 061 G LP GG 5 62.8 56.2 119
;
run;
Just to make sure I am understanding this properly
For example C_ID= H15 P_ID= 006 has three rows, with values 47, 44 and 47 and segment IDs of 1, 2 and 4, and the value you want is (47*1+44*2+47*4)/(1+2+4) = 46.14...
As I said, PROC SUMMARY makes this easy to do. Although your PROC SQL also gives this answer, PROC SQL will give an incorrect answer in the presence of missing values, while PROC SUMMARY does not have this problem.
proc summary data=have nway;
class contract_id plan_id;
var part_c_d_prem_combine /weight=segment_id1;
output out=want mean=;
run;
Actually its not hard to do using PROC SUMMARY, but its not clear from your description what the weights are. The weights have to be a variable in your data set.
Just to make sure I am understanding this properly
For example C_ID= H15 P_ID= 006 has three rows, with values 47, 44 and 47 and segment IDs of 1, 2 and 4, and the value you want is (47*1+44*2+47*4)/(1+2+4) = 46.14...
As I said, PROC SUMMARY makes this easy to do. Although your PROC SQL also gives this answer, PROC SQL will give an incorrect answer in the presence of missing values, while PROC SUMMARY does not have this problem.
proc summary data=have nway;
class contract_id plan_id;
var part_c_d_prem_combine /weight=segment_id1;
output out=want mean=;
run;
@PaigeMiller thanks for the information!
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.