DATA HAVE;
INPUT patientid drug hf db ccb ace me beta plan$;
datalines;
1 1 0 1 0 1 A
2 0 1 0 0 0 B
3 1 1 1 1 1 A
4 0 1 0 1 0 C
5 1 1 1 1 1 A
6 1 0 1 0 1 A
7 0 1 0 0 0 C
8 1 1 1 1 1 B
9 0 1 0 1 0 A
10 0 1 1 1 1 A
;
run;
I want to summarize the table above by group (drug). The Output dataset which contains n(%)
Drug=1 Dug=0
Variable
HF 3 60 5 100
DB
CCB
AC
ME
BETA
Plan
A 4 80 2 40
B
C
Part of the problem is putting the data into the best layout to produce the analysis you want. This can be done via PROC TRANSPOSE. Then the report you want can be produced via PROC REPORT. Here is my attempt:
DATA HAVE;
infile cards expandtabs;
INPUT patientid drug hf db ccb ace plan$;
if plan='A' then a=1; else a=0;
if plan='B' then b=1; else b=0;
if plan='C' then c=1; else c=0;
drop plan;
datalines;
1 1 0 1 0 1 A
2 0 1 0 0 0 B
3 1 1 1 1 1 A
4 0 1 0 1 0 C
5 1 1 1 1 1 A
6 1 0 1 0 1 A
7 0 1 0 0 0 C
8 1 1 1 1 1 B
9 0 1 0 1 0 A
10 0 1 1 1 1 A
;
run;
proc transpose data=have out=have_t;
by patientid drug;
var hf db ccb ace a b c;
run;
proc report data=have_t;
columns _name_ drug,(col1=col1a col1);
define _name_/group "NAME" order=data;
define drug/across "DRUG";
define col1a/sum "N" format=comma6.0;
define col1/mean "Percent" format=percent6.0;
run;
Please explain how for HF, you get 3 for drug 1 and 5 for drug zero. Also please explain how the percents are 60% and 100%.
Please explain what is being computed for A. It is not obvious what you want for A, and how you split A between drug 1 and drug 0, and it is not obvious how the percents are 80 and 40.
Also, do you want the table to appear EXACTLY as you have shown it, or can there be reasonable variations from what you have shown?
I couldn't agree more. Here is the explanation:
There 5 subjects who used drug= 1 and 5 who used drug=0. I want the table to summarize all the other variables within each of the drug category. For example, 3 out of 5 have hf=1 among those who drug=1 (60%) and similarly among those with drug=0, 5 out of 5 had hf=1 (100%).
Thank you.
However, I asked three questions, I'd like to know the answer to all of my questions, not just the first question.
Part of the problem is putting the data into the best layout to produce the analysis you want. This can be done via PROC TRANSPOSE. Then the report you want can be produced via PROC REPORT. Here is my attempt:
DATA HAVE;
infile cards expandtabs;
INPUT patientid drug hf db ccb ace plan$;
if plan='A' then a=1; else a=0;
if plan='B' then b=1; else b=0;
if plan='C' then c=1; else c=0;
drop plan;
datalines;
1 1 0 1 0 1 A
2 0 1 0 0 0 B
3 1 1 1 1 1 A
4 0 1 0 1 0 C
5 1 1 1 1 1 A
6 1 0 1 0 1 A
7 0 1 0 0 0 C
8 1 1 1 1 1 B
9 0 1 0 1 0 A
10 0 1 1 1 1 A
;
run;
proc transpose data=have out=have_t;
by patientid drug;
var hf db ccb ace a b c;
run;
proc report data=have_t;
columns _name_ drug,(col1=col1a col1);
define _name_/group "NAME" order=data;
define drug/across "DRUG";
define col1a/sum "N" format=comma6.0;
define col1/mean "Percent" format=percent6.0;
run;
Thank you Paigemiller! I didn't notice the other 2. Here are the answers:
2-Plan is a variable with 3 categories, A, B, and C. I want the proportion of subjects within each drug category.
3-As long as the table captures the correct proportion and % within each category
I add to my explanation of why you need to put the data into a good layout for analysis
You may have heard me (or others) say in the past that for most situations, a long data set is superior to a wide data set. This is a good example. Once the data is in the long format, the analysis is easy.
I am not sure if I understood exactly what you wanted, but I took a crack at it. Hope it helps you!
Also in your original HAVE dataset, you only had 7 values for 9 variables, so I randomly added 2 values per observation for those 2 variables. I also attached a screenshot of what the report looks like!
DATA HAVE;
INPUT patientid drug hf db ccb ace me beta plan$;
datalines;
1 1 0 1 0 1 1 0 A
2 0 1 0 0 0 0 1 B
3 1 1 1 1 1 1 1 A
4 0 1 0 1 0 0 1 C
5 1 1 1 1 1 0 0 A
6 1 0 1 0 1 1 1 A
7 0 1 0 0 0 0 1 C
8 1 1 1 1 1 1 0 B
9 0 1 0 1 0 1 1 A
10 0 1 1 1 1 0 1 A
;
run;
DATA HAVE1;
INPUT var$ std;
datalines;
hf 0.01
db 0.001
ccb 0.03
ace 0.033
me 0.044
beta 0.0981
plan 0.20
;
run;
proc sql;
create table have_sum as
select drug,sum(hf) as hf_sum,sum(db) as db_sum, sum(ccb) as ccb_sum,
sum(ace) as ace_sum, sum(me) as me_sum, sum(beta) as beta_sum
from have
group by drug;
quit;
proc sort data=have out=have_sorted;
by drug;
run;
proc transpose data=have_sum out=have_sum_long prefix=drug;
var hf_sum db_sum ccb_sum ace_sum me_sum beta_sum;
id drug;
run;
proc freq data=have;
tables drug*plan/list outpct missing out=plan;
run;
proc sort data=plan out=plan_sorted;
by plan;
run;
proc transpose data=plan_sorted out=plan1(drop=_name_ _label_) prefix=drug;
id drug;
by plan;
var count;
run;
proc sql;
create table merged1 as
select * from
(select var as var1, drug0, drug0/5 format=percent7.2 as pct0, drug1, drug1/5 format=percent7.2 as pct1, std
from have_sum_long as A full join have1 as B
on scan(A._name_,1,"_")=B.var)
outer union corr
select * from
(select plan as var1, drug0, drug0/5 format=percent7.2 as pct0,drug1, drug1/5 format=percent7.2 as pct1, . as std
from plan1)
;
quit;
proc report data=merged1;
columns var1 drug0 pct0 drug1 pct1 std;
define var1/display 'Measures';
define drug0/display 'Drug=0';
define pct0/display 'Drug=0 (%)';
define drug1/display 'Drug=1';
define pct1/display 'Drug=1 (%)';
define std/display 'STD';
run;
Thank you both!
Hi,
data have;
input patientid drug hgb hct ast alt bun rbc ldl ;
cards4;
1 1 0 1 0 1 1 0 0
2 0 1 0 0 0 0 1 0
3 1 1 1 1 1 1 0 0
4 0 1 0 1 0 0 0 1
5 1 1 1 1 1 1 0 0
6 1 0 1 0 1 1 0 0
7 0 1 0 0 0 0 0 1
8 1 1 1 1 1 0 1 0
9 0 1 0 1 0 1 0 0
0 0 1 1 1 1 1 0 0
;;;;
run;quit;
ods listing close;
options ls = MAX nodate nocenter;
filename t TEMP lrecl=500;
ods listing file = t;
title; footnote;
proc tabulate data = have noseps;
class drug;
var hgb hct ast alt bun rbc ldl;
table (hgb hct ast alt bun rbc ldl),
drug=" " * (sum=" "*f=best32. mean=" "*f=best32.)
/
;
run;
ods listing close;
data want;
infile t dlm="|" missover;
input type : $ n_0 pcnt_0 n_1 pcnt_1;
format n_: 5.0 pcnt_: percent12.0;
if nmiss(n_0,pcnt_0,n_1,pcnt_1) = 0;
run;
proc print data = want;
run;
filename t CLEAR;
ods listing;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.