I have a following dataset. WHat I want to do is count TOTAL number of "term" per "class". Duplicates are OK. For example Patient ID = 1 has two "term 2" but with different dates so it should be counted twice. Total denominator should be grouped by "Class" denominator. But unique adult and baby should also be counted.
Raw dataset
treatment_group | ID | class | term | date | group | adult | baby |
placebo | 2 | GI related | term 1 | 1/3/2020 | adult | 1 | 0 |
placebo | 2 | GI related | term 2 | 2/1/2020 | adult | 1 | 0 |
placebo | 5 | Heart related | term 2 | 2/4/2020 | baby | 0 | 1 |
placebo | 5 | Heart related | term 3 | 2/5/2020 | baby | 0 | 1 |
treatment | 3 | General disorders | term 1 | 2/2/2019 | adult | 1 | 0 |
treatment | 4 | General disorders | term 2 | 2/3/2020 | adult | 1 | 0 |
treatment | 1 | Heart related | term 1 | 1/2/2019 | baby | 0 | 1 |
treatment | 1 | Heart related | term 2 | 1/2/2019 | baby | 0 | 1 |
treatment | 1 | Heart related | term 2 | 1/4/2019 | baby | 0 | 1 |
Want:
treatment_group | Class | Term | event | total_class_denominator | adult | child |
placebo | GI related | term 1 | 1 | 2 | 1 | 0 |
placebo | GI related | term 2 | 1 | 2 | 1 | 0 |
placebo | Heart related | term 2 | 1 | 2 | 0 | 1 |
placebo | Heart related | term 3 | 1 | 2 | 0 | 1 |
treatment | General disorders | term 1 | 1 | 2 | 1 | 0 |
treatment | General disorders | term 2 | 1 | 2 | 1 | 0 |
treatment | Heart related | term 1 | 1 | 3 | 0 | 1 |
treatment | Heart related | term 2 | 2 | 3 | 0 | 1 |
I tried the following
proc sql;
select treatment_group, class, term,
count(term) as total_event_numerator,
count(class) as total_event_denominator,
sum(age_child) as child,
sum(age_infant) as infant
from test
group by treatment,class, term
quit;
Hi @monday89
here is an attempt to achieve this:
data test;
infile datalines dlm="09"x;
input treatment_group :$20. ID class :$20. term:$20. date:MMDDYY10. group:$20.
adult baby;
datalines;
placebo 2 GI related term 1 1/3/2020 adult 1 0
placebo 2 GI related term 2 2/1/2020 adult 1 0
placebo 5 Heart related term 2 2/4/2020 baby 0 1
placebo 5 Heart related term 3 2/5/2020 baby 0 1
treatment 3 General disorders term 1 2/2/2019 adult 1 0
treatment 4 General disorders term 2 2/3/2020 adult 1 0
treatment 1 Heart related term 1 1/2/2019 baby 0 1
treatment 1 Heart related term 2 1/2/2019 baby 0 1
treatment 1 Heart related term 2 1/4/2019 baby 0 1
;
run;
/* Remove duplicates if the same term occurs for the same patient at the same date*/
proc sort data=test out=test_nodup nodupkey;
by treatment_group ID class term date;
run;
/*Perform calculations*/
proc sql;
create table want as
select a.*, b.total_class_denominator
from (select distinct treatment_group, class, term, adult, baby as child,
count(term) as event
from test
group by treatment_group, class, term) as a
inner join
(select distinct treatment_group, class, term, count(class) as total_class_denominator
from test_nodup
group by treatment_group, class) as b
on a.treatment_group = b.treatment_group and a.class=b.class and a.term=b.term;
quit;
This is a job for PROC SUMMARY
proc summary data=test nway;
class treatment_group class term;
types treatment_group*class treatment_group*class*term;
var adult baby;
output out=_stats_ max= n(adult)=n;
run;
data want;
merge _stats_(where=(_type_=6) rename=(n=total_class_denominator) drop=term adult baby)
_stats_(where=(_type_=7) rename=(n=event));
by treatment_group class;
drop _type_ _freq_;
run;
This is where PROC SUMMARY excels, when you need data aggregated at two (or more) different levels of aggregation.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: