About penguins0101

penguins0101 · ‎07-26-2022

I have a dataset data have; input ID Type $ DIS $ Timepoint Answer_1 Answer_2 Payment; datalines; 1001 0to5 A 0 0 1 100 1001 0to5 A 1 1 0 210 1001 0to5 A 3 1 . 340 1002 0to5 A 0 1 0 290 1002 0to5 A 1 . 1 180 1003 0to5 B 0 1 1 200 1003 0to5 B 2 . 0 150 1004 6to9 A 0 . . 300 1004 6to9 A 1 1 1 100 1004 6to9 A 2 1 . 340 1005 6to9 A 0 1 1 900 1005 6to9 A 1 0 1 100 1005 6to9 A 3 . 0 940 1006 6to9 A 0 1 . 400 1006 6to9 A 1 1 1 450 ; I would like the output table: Type DIS MedianPayment_Ans1 MedianPayment_Ans2 Count 0to5 A 250 195 2 0to5 B 200 175 1 6to9 A 370 450 3 The output is: grouped by Type and DIS, with three new columns. Two of the columns are the median payments for each respective answer field, only considering the values which have the respective answer not being missing. The third column has the count of the number of unique ID's which contributed to the median. Constraints are: - There are many ID's, Type's and DIS's - Data set is ordered on ID and Timepoint - For any distinct ID, the first Timepoint is equal to 0, and Timepoint is a nonnegative integer - All answer fields are either 0, 1 or missing (.) . - Payment is a positive integer, possibly missing too. - There are at least 50 answer fields. Could someone help me with this please? Preferably using proc sql. I thought of using some indicator function but it didn't work: proc sql; create table want as select Type ,DIS ,median(Payment*(Answer1 ne .)) as medianPayment_Ans1 ,median(Payment*(Answer2 ne .)) as medianPayment_Ans2 from have group by Type, DIS; quit;

Online Status	Offline
Date Last Visited	‎07-26-2022 06:49 PM

Summary functions conditioning on another column, proc sql

Summary functions conditioning on another column, proc sql