BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Merdock
Quartz | Level 8

Hello,

 

I have the following dataset and I was wondering what would be the shortest, most elegant way to get an output similar to the one in screenshot attached? I'm thinking I can probably get it done the hard way by manipulating a bunch of proc freq statements but was hoping someone has suggestions for a more efficient/shorter way to obtain such a table? Total eligible column contains the number of participants in study=A, whereas Total Enrolled refers to number of participants in study=B. Basically, the idea behind this table is that I want to see, across the 2 age groups, how many of the participants from study A also enrolled in study B.

Capture.PNG

data have;
input patid $ site $ agegrp study $;
datalines;
1001 site1  1 A
1002 site1  1 A
1003 site1  2 A
1004 site1  2 A
1005 site1  2 A
1006 site1  2 A
1006 site1  2 B
2001 site2  1 A
2002 site2  1 A
2003 site2  1 A
2004 site2  1 A
2005 site2  1 A
2005 site2  1 B
2006 site2  1 A
2007 site2  1 A
2008 site2  1 A
2008 site2  1 B
2009 site2  1 A
2009 site2  1 B
2010 site2  2 A
2010 site2  2 B
2011 site2  2 A
2011 site2  2 B
2012 site2  2 A
2012 site2  2 B
2013 site2  2 A
2013 site2  2 B
2014 site2  2 A
2014 site2  2 B
2015 site2  2 A
2015 site2  2 B
2016 site2  2 A
3001 site3  1 A
3002 site3  1 A
3002 site3  1 B
3003 site3  1 A
3003 site3  1 B
3004 site3  2 A
3005 site3  2 A
; 
run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Look at this:

data int; /* make numeric variables for summing */
set have;
eligible = (study = "A");
enrolled = (study = "B");
run;

proc summary data=int; /* summarize */
var enrolled eligible;
class site agegrp;
output
  out=want1 (where=(_type_ in (2,3)))
  sum()=
;
run;

data want2; /* create "total" values */
set want1;
if _type_ = 2 then agegrp = 99;
run;

proc format; /* nice text for across variable */
value agegrp
  1 = "Age Group 1"
  2 = "Age Group 2"
  99 = "Total"
;
run;

proc report data=want2;
column site agegrp,(eligible enrolled percent);
define site /group;
define agegrp / "Age Group" across format=agegrp.;
define eligible / "Total Eligible N" analysis sum;
define enrolled / "Total Enrolled N" analysis sum;
define percent / "Enrolled Percent" computed format=percent7.2;
compute percent;
  _c4_ = _c3_ / _c2_;
  _c7_ = _c6_ / _c5_;
  _c10_ = _c9_ / _c8_;
endcomp;
run;

View solution in original post

7 REPLIES 7
andreas_lds
Jade | Level 19

Maybe with proc tabulate, but you won't get count and percentage in one cell.

Kurt_Bremser
Super User

Look at this:

data int; /* make numeric variables for summing */
set have;
eligible = (study = "A");
enrolled = (study = "B");
run;

proc summary data=int; /* summarize */
var enrolled eligible;
class site agegrp;
output
  out=want1 (where=(_type_ in (2,3)))
  sum()=
;
run;

data want2; /* create "total" values */
set want1;
if _type_ = 2 then agegrp = 99;
run;

proc format; /* nice text for across variable */
value agegrp
  1 = "Age Group 1"
  2 = "Age Group 2"
  99 = "Total"
;
run;

proc report data=want2;
column site agegrp,(eligible enrolled percent);
define site /group;
define agegrp / "Age Group" across format=agegrp.;
define eligible / "Total Eligible N" analysis sum;
define enrolled / "Total Enrolled N" analysis sum;
define percent / "Enrolled Percent" computed format=percent7.2;
compute percent;
  _c4_ = _c3_ / _c2_;
  _c7_ = _c6_ / _c5_;
  _c10_ = _c9_ / _c8_;
endcomp;
run;
Merdock
Quartz | Level 8
@Kurt_Bremser, thank you so much, this works perfectly. I do have one small question though: I noticed that if I try to change the value of age group in proc format so that instead of "Age Group 1" and "Age Group 2" I have "10-<25 years" and "25-<40 years", the final table (want2) won't display the age group columns in chronological order anymore, i.e. "10-<25 years" first and then "25-<40 years". Instead, I get the "25-<40 years" column first. How can I change this so that the first group displayed is the "10-<25 years" one?

Thanks again!
Kurt_Bremser
Super User

I can't repeat your behavior; could it be that the unformatted values are different in your real data than the ones you posted here?

Anyway, use an ORDER= option:

proc report data=want2;
column site agegrp,(eligible enrolled percent);
define site /group;
define agegrp / "Age Group" across format=agegrp. order=formatted;
define eligible / "Total Eligible N" analysis sum;
define enrolled / "Total Enrolled N" analysis sum;
define percent / "Enrolled Percent" computed format=percent7.2;
compute percent;
  _c4_ = _c3_ / _c2_;
  _c7_ = _c6_ / _c5_;
  _c10_ = _c9_ / _c8_;
endcomp;
run;

This will work as long as the formatted values sort correctly in lexical order.

Merdock
Quartz | Level 8
@Kurt_Bremser, thank you! Not quite sure what the issue was but it looks like adding the order=internal statement fixed it and is now displaying what I expected.
Reeza
Super User

See this as a starting point. 

https://communities.sas.com/t5/SAS-Communities-Library/Demographic-Table-and-Subgroup-Summary-Macro-...

 

Note: Do you want the must compact code or do you want code that will be easy to modify/update as needed? There's often a big trade off there. 

 

 

Merdock
Quartz | Level 8

@Reeza, thank you for the link, very helpful! Ideally, I'd like to have a combination of both compact and easily updated to apply to other situations but I think that for this case, I'll settle for compactness.