I have a dataset data have;
input ID Type $ DIS $ Timepoint Answer_1 Answer_2 Payment;
datalines;
1001 0to5 A 0 0 1 100
1001 0to5 A 1 1 0 210
1001 0to5 A 3 1 . 340
1002 0to5 A 0 1 0 290
1002 0to5 A 1 . 1 180
1003 0to5 B 0 1 1 200
1003 0to5 B 2 . 0 150
1004 6to9 A 0 . . 300
1004 6to9 A 1 1 1 100
1004 6to9 A 2 1 . 340
1005 6to9 A 0 1 1 900
1005 6to9 A 1 0 1 100
1005 6to9 A 3 . 0 940
1006 6to9 A 0 1 . 400
1006 6to9 A 1 1 1 450
; I would like the output table: Type DIS MedianPayment_Ans1 MedianPayment_Ans2 Count 0to5 A 250 195 2 0to5 B 200 175 1 6to9 A 370 450 3 The output is: grouped by Type and DIS, with three new columns. Two of the columns are the median payments for each respective answer field, only considering the values which have the respective answer not being missing. The third column has the count of the number of unique ID's which contributed to the median. Constraints are: - There are many ID's, Type's and DIS's - Data set is ordered on ID and Timepoint - For any distinct ID, the first Timepoint is equal to 0, and Timepoint is a nonnegative integer - All answer fields are either 0, 1 or missing (.) . - Payment is a positive integer, possibly missing too. - There are at least 50 answer fields. Could someone help me with this please? Preferably using proc sql. I thought of using some indicator function but it didn't work: proc sql;
create table want as
select Type
,DIS
,median(Payment*(Answer1 ne .)) as medianPayment_Ans1
,median(Payment*(Answer2 ne .)) as medianPayment_Ans2
from have
group by Type, DIS;
quit;
... View more