BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Merdock
Quartz | Level 8

Hello,

 

I have the following dataset and I was wondering what would be the shortest, most elegant way to get an output similar to the one in screenshot attached? I'm thinking I can probably get it done the hard way by manipulating a bunch of proc freq statements but was hoping someone has suggestions for a more efficient/shorter way to obtain such a table? Total eligible column contains the number of participants in study=A, whereas Total Enrolled refers to number of participants in study=B. Basically, the idea behind this table is that I want to see, across the 2 age groups, how many of the participants from study A also enrolled in study B.

Capture.PNG

data have;
input patid $ site $ agegrp study $;
datalines;
1001 site1  1 A
1002 site1  1 A
1003 site1  2 A
1004 site1  2 A
1005 site1  2 A
1006 site1  2 A
1006 site1  2 B
2001 site2  1 A
2002 site2  1 A
2003 site2  1 A
2004 site2  1 A
2005 site2  1 A
2005 site2  1 B
2006 site2  1 A
2007 site2  1 A
2008 site2  1 A
2008 site2  1 B
2009 site2  1 A
2009 site2  1 B
2010 site2  2 A
2010 site2  2 B
2011 site2  2 A
2011 site2  2 B
2012 site2  2 A
2012 site2  2 B
2013 site2  2 A
2013 site2  2 B
2014 site2  2 A
2014 site2  2 B
2015 site2  2 A
2015 site2  2 B
2016 site2  2 A
3001 site3  1 A
3002 site3  1 A
3002 site3  1 B
3003 site3  1 A
3003 site3  1 B
3004 site3  2 A
3005 site3  2 A
; 
run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Look at this:

data int; /* make numeric variables for summing */
set have;
eligible = (study = "A");
enrolled = (study = "B");
run;

proc summary data=int; /* summarize */
var enrolled eligible;
class site agegrp;
output
  out=want1 (where=(_type_ in (2,3)))
  sum()=
;
run;

data want2; /* create "total" values */
set want1;
if _type_ = 2 then agegrp = 99;
run;

proc format; /* nice text for across variable */
value agegrp
  1 = "Age Group 1"
  2 = "Age Group 2"
  99 = "Total"
;
run;

proc report data=want2;
column site agegrp,(eligible enrolled percent);
define site /group;
define agegrp / "Age Group" across format=agegrp.;
define eligible / "Total Eligible N" analysis sum;
define enrolled / "Total Enrolled N" analysis sum;
define percent / "Enrolled Percent" computed format=percent7.2;
compute percent;
  _c4_ = _c3_ / _c2_;
  _c7_ = _c6_ / _c5_;
  _c10_ = _c9_ / _c8_;
endcomp;
run;

View solution in original post

7 REPLIES 7
andreas_lds
Jade | Level 19

Maybe with proc tabulate, but you won't get count and percentage in one cell.

Kurt_Bremser
Super User

Look at this:

data int; /* make numeric variables for summing */
set have;
eligible = (study = "A");
enrolled = (study = "B");
run;

proc summary data=int; /* summarize */
var enrolled eligible;
class site agegrp;
output
  out=want1 (where=(_type_ in (2,3)))
  sum()=
;
run;

data want2; /* create "total" values */
set want1;
if _type_ = 2 then agegrp = 99;
run;

proc format; /* nice text for across variable */
value agegrp
  1 = "Age Group 1"
  2 = "Age Group 2"
  99 = "Total"
;
run;

proc report data=want2;
column site agegrp,(eligible enrolled percent);
define site /group;
define agegrp / "Age Group" across format=agegrp.;
define eligible / "Total Eligible N" analysis sum;
define enrolled / "Total Enrolled N" analysis sum;
define percent / "Enrolled Percent" computed format=percent7.2;
compute percent;
  _c4_ = _c3_ / _c2_;
  _c7_ = _c6_ / _c5_;
  _c10_ = _c9_ / _c8_;
endcomp;
run;
Merdock
Quartz | Level 8
@Kurt_Bremser, thank you so much, this works perfectly. I do have one small question though: I noticed that if I try to change the value of age group in proc format so that instead of "Age Group 1" and "Age Group 2" I have "10-<25 years" and "25-<40 years", the final table (want2) won't display the age group columns in chronological order anymore, i.e. "10-<25 years" first and then "25-<40 years". Instead, I get the "25-<40 years" column first. How can I change this so that the first group displayed is the "10-<25 years" one?

Thanks again!
Kurt_Bremser
Super User

I can't repeat your behavior; could it be that the unformatted values are different in your real data than the ones you posted here?

Anyway, use an ORDER= option:

proc report data=want2;
column site agegrp,(eligible enrolled percent);
define site /group;
define agegrp / "Age Group" across format=agegrp. order=formatted;
define eligible / "Total Eligible N" analysis sum;
define enrolled / "Total Enrolled N" analysis sum;
define percent / "Enrolled Percent" computed format=percent7.2;
compute percent;
  _c4_ = _c3_ / _c2_;
  _c7_ = _c6_ / _c5_;
  _c10_ = _c9_ / _c8_;
endcomp;
run;

This will work as long as the formatted values sort correctly in lexical order.

Merdock
Quartz | Level 8
@Kurt_Bremser, thank you! Not quite sure what the issue was but it looks like adding the order=internal statement fixed it and is now displaying what I expected.
Reeza
Super User

See this as a starting point. 

https://communities.sas.com/t5/SAS-Communities-Library/Demographic-Table-and-Subgroup-Summary-Macro-...

 

Note: Do you want the must compact code or do you want code that will be easy to modify/update as needed? There's often a big trade off there. 

 

 

Merdock
Quartz | Level 8

@Reeza, thank you for the link, very helpful! Ideally, I'd like to have a combination of both compact and easily updated to apply to other situations but I think that for this case, I'll settle for compactness.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 836 views
  • 5 likes
  • 4 in conversation