Hello,
I have the following dataset and I was wondering what would be the shortest, most elegant way to get an output similar to the one in screenshot attached? I'm thinking I can probably get it done the hard way by manipulating a bunch of proc freq statements but was hoping someone has suggestions for a more efficient/shorter way to obtain such a table? Total eligible column contains the number of participants in study=A, whereas Total Enrolled refers to number of participants in study=B. Basically, the idea behind this table is that I want to see, across the 2 age groups, how many of the participants from study A also enrolled in study B.
data have; input patid $ site $ agegrp study $; datalines; 1001 site1 1 A 1002 site1 1 A 1003 site1 2 A 1004 site1 2 A 1005 site1 2 A 1006 site1 2 A 1006 site1 2 B 2001 site2 1 A 2002 site2 1 A 2003 site2 1 A 2004 site2 1 A 2005 site2 1 A 2005 site2 1 B 2006 site2 1 A 2007 site2 1 A 2008 site2 1 A 2008 site2 1 B 2009 site2 1 A 2009 site2 1 B 2010 site2 2 A 2010 site2 2 B 2011 site2 2 A 2011 site2 2 B 2012 site2 2 A 2012 site2 2 B 2013 site2 2 A 2013 site2 2 B 2014 site2 2 A 2014 site2 2 B 2015 site2 2 A 2015 site2 2 B 2016 site2 2 A 3001 site3 1 A 3002 site3 1 A 3002 site3 1 B 3003 site3 1 A 3003 site3 1 B 3004 site3 2 A 3005 site3 2 A ; run;
Look at this:
data int; /* make numeric variables for summing */
set have;
eligible = (study = "A");
enrolled = (study = "B");
run;
proc summary data=int; /* summarize */
var enrolled eligible;
class site agegrp;
output
out=want1 (where=(_type_ in (2,3)))
sum()=
;
run;
data want2; /* create "total" values */
set want1;
if _type_ = 2 then agegrp = 99;
run;
proc format; /* nice text for across variable */
value agegrp
1 = "Age Group 1"
2 = "Age Group 2"
99 = "Total"
;
run;
proc report data=want2;
column site agegrp,(eligible enrolled percent);
define site /group;
define agegrp / "Age Group" across format=agegrp.;
define eligible / "Total Eligible N" analysis sum;
define enrolled / "Total Enrolled N" analysis sum;
define percent / "Enrolled Percent" computed format=percent7.2;
compute percent;
_c4_ = _c3_ / _c2_;
_c7_ = _c6_ / _c5_;
_c10_ = _c9_ / _c8_;
endcomp;
run;
Maybe with proc tabulate, but you won't get count and percentage in one cell.
Look at this:
data int; /* make numeric variables for summing */
set have;
eligible = (study = "A");
enrolled = (study = "B");
run;
proc summary data=int; /* summarize */
var enrolled eligible;
class site agegrp;
output
out=want1 (where=(_type_ in (2,3)))
sum()=
;
run;
data want2; /* create "total" values */
set want1;
if _type_ = 2 then agegrp = 99;
run;
proc format; /* nice text for across variable */
value agegrp
1 = "Age Group 1"
2 = "Age Group 2"
99 = "Total"
;
run;
proc report data=want2;
column site agegrp,(eligible enrolled percent);
define site /group;
define agegrp / "Age Group" across format=agegrp.;
define eligible / "Total Eligible N" analysis sum;
define enrolled / "Total Enrolled N" analysis sum;
define percent / "Enrolled Percent" computed format=percent7.2;
compute percent;
_c4_ = _c3_ / _c2_;
_c7_ = _c6_ / _c5_;
_c10_ = _c9_ / _c8_;
endcomp;
run;
I can't repeat your behavior; could it be that the unformatted values are different in your real data than the ones you posted here?
Anyway, use an ORDER= option:
proc report data=want2;
column site agegrp,(eligible enrolled percent);
define site /group;
define agegrp / "Age Group" across format=agegrp. order=formatted;
define eligible / "Total Eligible N" analysis sum;
define enrolled / "Total Enrolled N" analysis sum;
define percent / "Enrolled Percent" computed format=percent7.2;
compute percent;
_c4_ = _c3_ / _c2_;
_c7_ = _c6_ / _c5_;
_c10_ = _c9_ / _c8_;
endcomp;
run;
This will work as long as the formatted values sort correctly in lexical order.
See this as a starting point.
Note: Do you want the must compact code or do you want code that will be easy to modify/update as needed? There's often a big trade off there.
@Reeza, thank you for the link, very helpful! Ideally, I'd like to have a combination of both compact and easily updated to apply to other situations but I think that for this case, I'll settle for compactness.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.