I want to make a macro loop to calculate percentages for each treatment group and any treatment group within the same study stage. I want to make a loop b/c I have more treatment groups than I care to repeatedly type, and I want a macro b/c I'll need to run the code with multiple datasets. If there is a better way than a loop and/or macro, please let me know!
I currently have the following:
*Code to get dataset;
proc sql;
create table data_ as
/*Treatment Groups*/
select stage, 'Grp' as type, trt, count(distinct patid) as cnt
from enroll_
where not missing(trt)
group by trt
union corr
/*All Subjects*/
select stage, 'All' as type, trt, count(distinct patid) as cnt
from enroll_
where not missing(trt)
group by stage
order by stage, type, trt;
quit;
*Resulting dataset;
data have;
input tab $ type $ trt $ cnt;
cards;
1 All T1 9
1 All T2 9
1 Grp T1 5
1 Grp T2 4
2A All T3 20
2A All T4 20
2A All T5 20
2A All T6 20
2A Grp T3 5
2A Grp T4 5
2A Grp T5 7
2A Grp T6 3
2B All T7 12
2B All T8 12
2B Grp T7 8
2B Grp T8 4
;
run;
I'm fine using the dataset I made, but I feel it may be easier to work with something more along the lines of this:
data want;
input tab $ trt $ cnt;
cards;
1 A1 9
1 T1 5
1 T2 4
2A A2 20
2A T3 5
2A T4 5
2A T5 7
2A T6 3
2B A3 12
2B T7 8
2B T8 4
;
run;
However, I am unsure if there's a direct way to do this. Only thing I can think of is replace TRT value with A1, A2, A3 and delete the extra rows.
I want to make a loop out of the following:
data data;
set data_;
if trt = 'A1' then pct = 100*cnt/&NA1;
if trt = 'T1' then pct = 100*cnt/&NT1;
if trt = 'T2' then pct = 100*cnt/&NT2;
*and so on...;
run;
The total counts are currently saved as macros. Is there a simple way to make the process of referring to &TRT and calculating percent of N&TRT?
*Want this to loop through list of treatments;
if trt = &trt then pct = 100*cnt/N&trt;
Note: Macro variables NA1, NT1, etc. were made in project set-up file (i.e. I did not assign myself).
Thank you in advance!
EDIT: TAB and STAGE are the same
Here's the general set-up of the ENROLL_ dataset. There are a lot more variables, but I only listed the ones that matter.
data enroll_;
input patid $ tab $ grp $;
cards;
CV001 . .
CV002 . .
CV003 1 T1
CV004 1 T2
CV005 2B T8
CV006 2A T3
CV007 1 T2
CV008 2B T8
CV009 . .
CV010 2B T7
CV011 2A T4
CV012 2A T4
CV013 2B T8
CV014 2A T5
CV015 2A T6
CV016 1 T2
CV017 . .
CV018 2B T7
CV019 2A T5
CV020 2A T5
;
run;
/*NOTE: TAB is same as STAGE from SQL code*/
/*NOTE: counts will not match CNT from HAVE dataset since it's too much to type*/
However, there are other datasets aside from ENROLL_ that I will have repeat ID's. In this event, I would be counting how many have at least one deviation (DVCODE) within each treatment group.
For example:
data dv_;
input patid $ tab $ grp $ dvcode $;
cards;
CV003 1 T1 01
CV003 1 T1 03
CV005 2B T8 01
CV005 2B T8 05
CV005 2B T8 02
CV010 2B T7 03
CV020 2A T5 01
CV020 2A T5 04
;
run;
proc sql;
create table data_ as
/*Treatment Groups*/
select tab as type, trt, count(distinct patid) as cnt
from enroll_
where not missing(trt)
group by trt
union corr
/*All Subjects*/
select 'All' as type, trt, count(distinct patid) as cnt
from enroll_
where not missing(trt)
group by tab
order by type, trt;
quit;
Let's deal with one question at a time, it seems like you're asking a few here.
This should get you that first table you wanted instead of the duplicates, does that work with your real data?
If so, what's the issue/relevance of the macro variables.
Then we'll come back to what you mean by multiple tables with multiple ID's.
Is there a way to have type be different for each TAB, or would that need to be a separate chunk of code?
Update: I just realized that the macro variables and values were also saved in a table, so I'm just going to merge by TRT and calculate percent using the two columns. Figured this is a lot less computational power.
Do you need a data set (for further manipulation) or a report (Humans read these)?
Quite often if a report is the actual need than one of the report procedures like Report or Tabulate will take care of the "dynamic" elements of changing group definitions if the variables are defined correctly and may well do it from the base data instead of using sql to partially summarize data first.
Quick example which creates two report tables of different layout.
proc tabulate data=sashelp.class; class sex age; tables sex*(age all='All ages'), n pctn='% of students' pctn<sex>='% of sex'; tables all='All students' sex, (age all='All ages')*(n pctn='% all students' rowpctn="% of sex" colpctn="% of age") ; run;
This and Proc Report will also create data sets but the structure for use is a bit of a learning curve.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.