BookmarkSubscribeRSS Feed
penguins0101
Calcite | Level 5

I have a dataset

data have;
input ID Type $ DIS $ Timepoint Answer_1 Answer_2 Payment;
datalines;
1001 0to5 A 0 0 1 100
1001 0to5 A 1 1 0 210
1001 0to5 A 3 1 . 340
1002 0to5 A 0 1 0 290
1002 0to5 A 1 . 1 180
1003 0to5 B 0 1 1 200
1003 0to5 B 2 . 0 150
1004 6to9 A 0 . . 300
1004 6to9 A 1 1 1 100
1004 6to9 A 2 1 . 340
1005 6to9 A 0 1 1 900
1005 6to9 A 1 0 1 100
1005 6to9 A 3 . 0 940
1006 6to9 A 0 1 . 400
1006 6to9 A 1 1 1 450
;


I would like the output table:

TypeDISMedianPayment_Ans1MedianPayment_Ans2Count
0to5A2501952
0to5B2001751
6to9A3704503


The output is: grouped by Type and DIS, with three new columns.

Two of the columns are the median payments for each respective answer field, only considering the values which have the respective answer not being missing.

The third column has the count of the number of unique ID's which contributed to the median.  
  
Constraints are:  
- There are many ID's, Type's and DIS's
- Data set is ordered on ID and Timepoint

- For any distinct ID, the first Timepoint is equal to 0, and Timepoint is a nonnegative integer

- All answer fields are either 0, 1 or missing (.) .

- Payment is a positive integer, possibly missing too.

- There are at least 50 answer fields.

 

Could someone help me with this please? Preferably using proc sql. I thought of using some indicator function but it didn't work:  

 

 

proc sql;
create table want as
select Type
,DIS
,median(Payment*(Answer1 ne .)) as medianPayment_Ans1
,median(Payment*(Answer2 ne .)) as medianPayment_Ans2
from have
group by Type, DIS;
quit;

 

 

 

3 REPLIES 3
Reeza
Super User

The third column has the count of the number of unique ID's which contributed to the median.  

 

Do missing values contribute here in any way?

ballardw
Super User
(Payment*(Answer1 ne .)) 

Is not going to do what you want because SAS returns 0 for false for the Answer1 ne . So Payment*0=0. Making it a valid value for the Median function to consider and getting included in a "count".

 

What is your "count" supposed to be counting? Since you have two variables that might be missing at random it is not clear what it might be counting.

 

The medians by not missing:

proc sql; 
   create table sortofwant as
   select a.type,a.dis,a.MedianPayment_Ans1,b.MedianPayment_Ans2
   from (
         select type,dis, median(payment) as MedianPayment_Ans1
         from (select * from have where not missing(answer_1))
         group by type,dis
        ) as a
        left join
        (
         select type,dis, median(payment) as MedianPayment_Ans2
         from (select * from have where not missing(answer_2))
         group by type,dis
        ) as b
        on a.type=b.type
           and a.dis=b.dis
   ;
quit;

Caution: small example =>small code and if you have a combination of answer_1 that are all missing for a type*dis combination that are not missing for answer_2 then you need something other than a left join to get both groups of summaries.

 

Note that this is likely going to be a bit of a headache if you try to extend it to many Answer periods.

Reeza
Super User

This should scale for you now:

 

data have;
input ID Type $ DIS $ Timepoint Answer_1 Answer_2 Payment;
datalines;
1001 0to5 A 0 0 1 100
1001 0to5 A 1 1 0 210
1001 0to5 A 3 1 . 340
1002 0to5 A 0 1 0 290
1002 0to5 A 1 . 1 180
1003 0to5 B 0 1 1 200
1003 0to5 B 2 . 0 150
1004 6to9 A 0 . . 300
1004 6to9 A 1 1 1 100
1004 6to9 A 2 1 . 340
1005 6to9 A 0 1 1 900
1005 6to9 A 1 0 1 100
1005 6to9 A 3 . 0 940
1006 6to9 A 0 1 . 400
1006 6to9 A 1 1 1 450
;

proc sort data=have;
by TYPE DIS ID;
run;


data have_expanded;
set have;
by type DIS ID;

first_id = first.id;

array pmt_ans(2);
array answer(2) answer_1-answer_2;

do i=1 to dim(answer);
if nmiss(answer(i), payment) = 0 then pmt_ans(i) = payment;
end;

run;

proc means data=have_expanded nway noprint;
by type dis;
var pmt_ans: ;
output out=want sum(first_id) = Num_IDs median =   / autoname;
run;



 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 514 views
  • 0 likes
  • 3 in conversation