Solved: Summary table of multiple variables by one variable

lillymaginta · Posted 03-22-2020 10:29 AM

DATA HAVE;
INPUT patientid drug hf db ccb ace me beta plan$;
datalines;
1 1 0 1 0 1 A
2 0 1 0 0 0 B
3 1 1 1 1 1 A
4 0 1 0 1 0 C
5 1 1 1 1 1 A
6 1 0 1 0 1 A
7 0 1 0 0 0 C
8 1 1 1 1 1 B
9 0 1 0 1 0 A
10 0 1 1 1 1 A
;
run;

I want to summarize the table above by group (drug). The Output dataset which contains n(%)

      Drug=1        Dug=0  
Variable
HF       3     60      5  100   
DB
CCB
AC
ME
BETA  
Plan
  A		4     80      2   40  
  B
  C

PaigeMiller · Posted 03-22-2020 11:33 AM

Part of the problem is putting the data into the best layout to produce the analysis you want. This can be done via PROC TRANSPOSE. Then the report you want can be produced via PROC REPORT. Here is my attempt:

DATA HAVE;
infile cards expandtabs;
INPUT patientid drug hf db ccb ace plan$;
if plan='A' then a=1; else a=0;
if plan='B' then b=1; else b=0;
if plan='C' then c=1; else c=0;
drop plan;
datalines;
1 1 0 1 0 1 A
2 0 1 0 0 0 B
3 1 1 1 1 1 A
4 0 1 0 1 0 C
5 1 1 1 1 1 A
6 1 0 1 0 1 A
7 0 1 0 0 0 C
8 1 1 1 1 1 B
9 0 1 0 1 0 A
10 0 1 1 1 1 A
;
run;
proc transpose data=have out=have_t;
    by patientid drug;
    var hf db ccb ace a b c;
run;
proc report data=have_t;
    columns _name_ drug,(col1=col1a col1);
    define _name_/group "NAME" order=data;
    define drug/across "DRUG";
    define col1a/sum "N" format=comma6.0;
    define col1/mean "Percent" format=percent6.0;
run;

--
Paige Miller

View solution in original post

PaigeMiller · Posted 03-22-2020 10:46 AM

Please explain how for HF, you get 3 for drug 1 and 5 for drug zero. Also please explain how the percents are 60% and 100%.

Please explain what is being computed for A. It is not obvious what you want for A, and how you split A between drug 1 and drug 0, and it is not obvious how the percents are 80 and 40.

Also, do you want the table to appear EXACTLY as you have shown it, or can there be reasonable variations from what you have shown?

--
Paige Miller

lillymaginta · Posted 03-22-2020 10:54 AM

I couldn't agree more. Here is the explanation:

There 5 subjects who used drug= 1 and 5 who used drug=0. I want the table to summarize all the other variables within each of the drug category. For example, 3 out of 5 have hf=1 among those who drug=1 (60%) and similarly among those with drug=0, 5 out of 5 had hf=1 (100%).

PaigeMiller · Posted 03-22-2020 11:06 AM

Thank you.

However, I asked three questions, I'd like to know the answer to all of my questions, not just the first question.

--
Paige Miller

PaigeMiller · Posted 03-22-2020 11:33 AM

Part of the problem is putting the data into the best layout to produce the analysis you want. This can be done via PROC TRANSPOSE. Then the report you want can be produced via PROC REPORT. Here is my attempt:

DATA HAVE;
infile cards expandtabs;
INPUT patientid drug hf db ccb ace plan$;
if plan='A' then a=1; else a=0;
if plan='B' then b=1; else b=0;
if plan='C' then c=1; else c=0;
drop plan;
datalines;
1 1 0 1 0 1 A
2 0 1 0 0 0 B
3 1 1 1 1 1 A
4 0 1 0 1 0 C
5 1 1 1 1 1 A
6 1 0 1 0 1 A
7 0 1 0 0 0 C
8 1 1 1 1 1 B
9 0 1 0 1 0 A
10 0 1 1 1 1 A
;
run;
proc transpose data=have out=have_t;
    by patientid drug;
    var hf db ccb ace a b c;
run;
proc report data=have_t;
    columns _name_ drug,(col1=col1a col1);
    define _name_/group "NAME" order=data;
    define drug/across "DRUG";
    define col1a/sum "N" format=comma6.0;
    define col1/mean "Percent" format=percent6.0;
run;

--
Paige Miller

lillymaginta · Posted 03-22-2020 11:46 AM

Thank you Paigemiller! I didn't notice the other 2. Here are the answers:

2-Plan is a variable with 3 categories, A, B, and C. I want the proportion of subjects within each drug category.

3-As long as the table captures the correct proportion and % within each category

PaigeMiller · Posted 03-22-2020 11:53 AM

I add to my explanation of why you need to put the data into a good layout for analysis

You may have heard me (or others) say in the past that for most situations, a long data set is superior to a wide data set. This is a good example. Once the data is in the long format, the analysis is easy.

--
Paige Miller

JIX · Posted 03-22-2020 11:35 AM

I am not sure if I understood exactly what you wanted, but I took a crack at it. Hope it helps you!

Also in your original HAVE dataset, you only had 7 values for 9 variables, so I randomly added 2 values per observation for those 2 variables. I also attached a screenshot of what the report looks like!

DATA HAVE;
INPUT patientid drug hf db ccb ace me beta plan$;
datalines;
1 1 0 1 0 1 1 0 A
2 0 1 0 0 0 0 1 B
3 1 1 1 1 1 1 1 A
4 0 1 0 1 0 0 1 C
5 1 1 1 1 1 0 0 A
6 1 0 1 0 1 1 1 A
7 0 1 0 0 0 0 1 C
8 1 1 1 1 1 1 0 B
9 0 1 0 1 0 1 1 A
10 0 1 1 1 1 0 1 A
;
run;

DATA HAVE1;
INPUT var$ std;
datalines;
hf 0.01
db 0.001
ccb 0.03
ace 0.033
me 0.044
beta 0.0981
plan 0.20
;
run;

proc sql;
create table have_sum as
select drug,sum(hf) as hf_sum,sum(db) as db_sum, sum(ccb) as ccb_sum, 
sum(ace) as ace_sum, sum(me) as me_sum, sum(beta) as beta_sum
from have
group by drug;
quit;

proc sort data=have out=have_sorted;
by drug;
run;

proc transpose data=have_sum out=have_sum_long prefix=drug;
var hf_sum db_sum ccb_sum ace_sum me_sum beta_sum;
id drug;
run;

proc freq data=have;
tables drug*plan/list outpct missing out=plan;
run;

proc sort data=plan out=plan_sorted;
by plan;
run;

proc transpose data=plan_sorted out=plan1(drop=_name_ _label_) prefix=drug;
id drug;
by plan;
var count;
run;

proc sql;
create table merged1 as
select * from
(select var as var1, drug0, drug0/5 format=percent7.2 as pct0, drug1, drug1/5 format=percent7.2 as pct1, std
from have_sum_long as A full join have1 as B
on scan(A._name_,1,"_")=B.var)
outer union corr
select * from 
(select plan as var1, drug0, drug0/5 format=percent7.2 as pct0,drug1, drug1/5 format=percent7.2 as pct1, . as std
from plan1)
;
quit; 

proc report data=merged1;
columns var1 drug0 pct0 drug1 pct1 std;
define var1/display 'Measures';
define drug0/display 'Drug=0';
define pct0/display 'Drug=0 (%)';
define drug1/display 'Drug=1';
define pct1/display 'Drug=1 (%)';
define std/display 'STD';
run;

lillymaginta · Posted 03-22-2020 11:47 AM

Thank you both!

yabwon · Posted 03-23-2020 01:22 PM

Hi,

Solution was already provided but just for fun: Proc Tabulate & ODS listing + a data step:

data have;
  input patientid drug hgb hct ast alt bun rbc ldl ;
cards4;
1 1 0 1 0 1 1 0 0
2 0 1 0 0 0 0 1 0
3 1 1 1 1 1 1 0 0
4 0 1 0 1 0 0 0 1
5 1 1 1 1 1 1 0 0
6 1 0 1 0 1 1 0 0
7 0 1 0 0 0 0 0 1
8 1 1 1 1 1 0 1 0
9 0 1 0 1 0 1 0 0
0 0 1 1 1 1 1 0 0
;;;;
run;quit;

ods listing close;
options ls = MAX nodate nocenter;  
filename t TEMP lrecl=500;
ods listing file = t;

title; footnote;
proc tabulate data = have noseps;
  class drug;
  var hgb hct ast alt bun rbc ldl;
  table (hgb hct ast alt bun rbc ldl),
        drug=" " * (sum=" "*f=best32. mean=" "*f=best32.)
        /
  ;
run;
ods listing close;

data want;
  infile t dlm="|" missover;
  input type : $ n_0 pcnt_0 n_1 pcnt_1;
  format n_: 5.0 pcnt_: percent12.0;
  if nmiss(n_0,pcnt_0,n_1,pcnt_1) = 0;
run;

proc print data = want;
run;
filename t CLEAR;
ods listing;

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Summary table of multiple variables by one variable

Re: Summary table of multiple variables by one variable

Re: Summary table of multiple variables by one variable

Re: Summary table of multiple variables by one variable

Re: Summary table of multiple variables by one variable

Re: Summary table of multiple variables by one variable

Re: Summary table of multiple variables by one variable

Re: Summary table of multiple variables by one variable

Re: Summary table of multiple variables by one variable

Re: Summary table of multiple variables by one variable

Re: Summary table of multiple variables by one variable

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away