Hi guys, in order of reproducing the next table (forget Obs, only col0 placebo drug and total), I wrote the code below .
Obs | col0 | placebo | drug | total |
1 | No of Subjects | 19 | 21 | 40 | |||
2 | race | ||||||
3 | Oriental | 0 | 0.00% | 1 | 4.76% | 1 | 2.50% |
4 | Asian | 1 | 5.26% | 0 | 0.00% | 1 | 2.50% |
5 | Black | 7 36.84% | 7 33.33% | 14 35.00% | |||
6 | Caucasian | 11 57.89% | 13 61.90% | 24 60.00% | |||
7 | sex | ||||||
8 | Female | 9 47.37% | 10 47.62% | 19 47.50% | |||
9 | Male | 10 52.63% | 11 52.38% | 21 52.50% | |||
10 | Height | ||||||
11 | MIN | 137.5 | 133.3 | 133.3 | |||
12 | MEAN | 150.4 | 147.6 | 148.9 | |||
13 | MAX | 162.4 | 154.1 | 162.4 | |||
14 | N | 19 | 21 | 40 | |||
15 | STD | 7.09 | 5.40 | 6.34 | |||
16 | Age | ||||||
17 | MIN | 11.0 | 19.0 | 11.0 | |||
18 | MEAN | 25.6 | 30.8 | 28.3 | |||
19 | N | 19 | 21 | 40 | |||
20 | MAX | 50.0 | 56.0 | 56.0 | |||
21 | STD | 8.78 | 8.33 | 8.84 | |||
22 | Weight | ||||||
23 | N | 19 | 21 | 40 | |||
24 | MIN | 52.7 | 54.8 | 52.7 | |||
25 | MEAN | 64.3 | 66.8 | 65.6 | |||
26 | STD | 7.50 | 7.45 | 7.48 | |||
27 | MAX | 75.5 | 85.1 | 85.1 |
Please, could you tell me if I could simplify some step in the next code (because, I think is a bit long, and maybe I could avoid some steps)?
The code use three macros:
1) tablesc...calculate sex, and race
2) statx...calculate Height, Weight and Age.
3) transpos...Transposing the tables.
.....
procedures used:
1) proc sql:
a)in order to summarize the variables in (placebo, drug and total) avoiding proc freq. and proc transpose.
b)add extra row
2)proc means:..to calculate min, max, mean and std
3)proc transpose: to get the right structure of the final table
4) proc append: to concatenate tables.
-------------------------------------------------
Source dataset newtot:
data newtot;
length sex race trt $20;
length subno weight height age 8;
input subno weight height sex race age trt;
datalines;
1 63.7 141.85 Male Caucasian 19 Drug
2 72.6 149.22 Female Black 27 Placebo
3 60.7 147.64 Female Caucasian 26 Placebo
4 64.2 150.93 Male Caucasian 21 Placebo
5 60.6 153.56 Male Asian 41 Placebo
6 66.8 139.30 Male Caucasian 50 Placebo
7 75.5 147.93 Female Caucasian 24 Drug
8 55.8 137.51 Female Caucasian 21 Placebo
9 73.7 146.30 Male Caucasian 28 Drug
10 68.2 133.28 Male Caucasian 24 Drug
11 56.3 148.04 Female Caucasian 30 Drug
12 60.7 141.53 Male Black 29 Drug
13 63.2 147.19 Female Caucasian 21 Placebo
14 74.7 155.35 Male Black 20 Placebo
15 70.8 152.61 Female Caucasian 33 Drug
16 71.4 147.75 Male Black 37 Drug
17 60.1 162.40 Female Caucasian 28 Placebo
18 53.4 152.24 Male Caucasian 19 Placebo
19 65.4 142.22 Male Caucasian 37 Drug
20 52.7 161.64 Female Black 34 Placebo
21 57.9 153.62 Female Black 33 Drug
22 62.2 147.84 Male Black 29 Drug
23 72.9 139.20 Male Black 21 Placebo
24 67.4 143.55 Male Caucasian 20 Placebo
25 69.4 154.10 Female Caucasian 31 Drug
26 72.1 149.09 Male Caucasian 36 Drug
27 64.6 152.56 Female Black 25 Placebo
28 75.1 155.37 Female Caucasian 19 Placebo
29 63.1 153.64 Female Black 37 Drug
30 54.8 149.17 Female Black 34 Drug
31 75.5 149.39 Female Black 26 Placebo
32 72.7 149.27 Male Caucasian 29 Drug
33 68.2 149.57 Male Caucasian 11 Placebo
34 69.0 152.04 Female Caucasian 20 Drug
35 57.9 151.13 Male Caucasian 31 Placebo
36 61.5 138.67 Male Caucasian 56 Drug
37 55.5 159.60 Male Black 25 Placebo
38 71.0 148.42 Male Oriental 19 Drug
39 85.1 152.91 Female Black 37 Drug
40 58.2 148.92 Female Caucasian 24 Drug
;
run;
My Code:
---------------------------------------------------------------------------------------------------------------------------------------------
options symbolgen mprint mlogic;
*using proc sql to summarize sex and race according to the final table....avoiding proc freq and proc transpose;
proc sql;
create table table0n as
select 'No of Subjects' as col0, sum(trt='Drug') as Drug, sum(trt='Placebo') as Placebo, count(trt) as total
from newtot;
quit;
*creating global macrovariables for porcentages;
data null;
set table0n;
call symput('drugx',drug);
call symput('placebox',placebo);
call symput('totalx',total);
run;
%put &drugx;
%put &placebox;
%put &totalx;
*convert placebo, drug and total to character variables;
data table0c (rename=(drugc=drug placeboc=placebo totalc=total)drop=drug placebo total);
set table0n;
drugc=put(drug,best12.);
placeboc=put(placebo,best12.);
totalc=put(total,best12.);
run;
*macro for generate tablecx, x=1 sex, x=2 race ;
%macro tablesc(num=,var=);
proc sql;
create table table&num as
select &var as col0, sum(trt='Drug') as Drug, sum(trt='Placebo') as Placebo, count(trt) as total
from newtot
group by &var
order by &var descending;
quit;
*change numeric variables to character variables;
data tablec&num (rename=(drugc=drug placeboc=placebo totalc=total)drop=drug placebo total);
length drugc $40 placeboc $40 totalc $40;
set table#
drugc=put(drug,best12.)||' ' || put(drug/&drugx,percent9.2);
placeboc=put(placebo,best12.)||' ' || put(placebo/&placebox,percent9.2);
totalc=put(total,best12.)||' ' || put(total/&totalx,percent9.2);
run;
*Adding new record (Header of table);
proc sql;
insert into tablec&num
(Drug,Placebo, total, Col0)
values
(' ',' ', ' ', "&var");
quit;
proc sort data=tablec# by total; run; *Fixing the Header at beginning of the table;
*accumulating tables sex and race to get the final table;
proc append base=name0ok data=tablec# run;
%mend;
*executing macro to get the table(sex,race) = tablectotal;
%tablesc(num=1,var=race);
%tablesc(num=2,var=sex);
*macro to generate stat table tablestatx, x=1 height, x=2 age, x=3 weight;
%macro statx(var=,out=);
*sorting the source table by trt before use proc means;
proc sort data=newtot out=newtots;
by trt;
run;
*proc means by trt;
proc means data=newtots noprint;
var &var;
output out=newstat (rename=(_stat_=col0) drop=_type_ _freq_);
by trt;
run;
*proc means (total);
proc means data=newtots noprint;
var &var;
output out=newstat2(rename=(_stat_=col0) drop=_type_ _freq_);
run;
*concatenaiting the tables (total+drug);
data totnewstat;
set newstat newstat2;
if trt= ' ' then trt='Total';
run;
*sorting these tables to use proc transpose;
proc sort data=totnewstat out=totnewstat&out (rename=(&var=col1));
by col0;
run;
%mend;
%statx(var=height,out=h);
%statx(var=weight,out=w);
%statx(var=age,out=a);
*In this macro, we will traspose the (drug+total) statistical tables;
%macro transpos(in=,name=,var=);
*formating the these tables;
data newstatformat (drop=col1);
set totnewstat∈
if col0 in ('MEAN','MAX','MIN') then value=strip(put(col1,6.1));
else if col0 in ('STD') then value=strip(put(col1,6.2));
else value=strip(put(col1,6.0));
run;
*transposing these tables;
proc transpose data=newstatformat out=&name (drop=_name_);
by col0;
id trt;
var value;
run;
*Adding new record (Header of table);
proc sql;
insert into &name
(Drug,Placebo,Total,col0)
values
(' ',' ', ' ', "&var");
quit;
proc sort data=&name; by total; run; *Fixing the Header at beginning of the table;
*accumulating tables sex and race to get the final table;
proc append base=nameok data=&name; run;
%mend;
%transpos(in=h,name=newhstat,var=Height);
%transpos(in=a,name=newastat,var=Age);
%transpos(in=w,name=newwstat,var=Weight);
data table;
length col0 $40 drug $40 placebo $40 total $40;
set table0c Name0ok Nameok;
run;
proc sql;
create table tabledemog as
select col0 label '------', placebo, drug, total
from table;
quit;
proc print data=tabledemog; run;
-------------------------------------------------------------------------------------
Thanks.
V
Don't be loopy. You are looping on variable names that's not good.
Process all discrete variables at once (one call to PROC FREQ, or method of your choice). Process all continuous variables at once (one call to PROC SUMMARY). This may actually make the code seem more complex but it will be more flexible in the long term.
Make everything more dynamic and data driven. Let the program figure out how many. You can write a program that accepts a few macro variable as input to make this table.
data=
continuous=
discrete=
trt=
variableOrderInOutput=
Use existing meta data LABEL and FORMAT to label rows and columns.
Derive decimal precision for statistics from the format D of the associated variable.
Don't put summary statistic into macro variables.
Hi Data_null, please, could you write a short code more efficient for this table?
I can learn from there.
Thanks.
V.
You could use something like this..
proc summary data=sashelp.class;
class sex;
var age height weight;
output out=stats;
run;
This would give you all the continuous variables summarized in one pass of the data. Then you can figure ways to transform/transpose the data into the stats in rows structure you need. When I do this I also convert the statistics to character using the format+decimal offset for each statistic. N=Integer, MEAN STD + 1 dp, MIN MAX + 0 same precision as variable. You can do this with PROC TRANSPOSE by transposing the numeric statistics with a character variable to cause a conversion to character. You will need to code gen a FORMAT statement to add the offset to the D of each variable associated format. The W is not so important as long as it is wide enough, 16 should suffice most of the time.
Thanks, it helps.
My idea of using macro was to make it more simple if you have many discrete (class variables) or continuous (var variables) involved.
The idea of using proc sql was to get an easy way to summarize (drug, placebo, and total), rather than: sorting, summarizing (proc summary or proc freq) and transposing, to get the same result.
As the number of variables grows that is all the more reason why you should process them all at once instead of looping on each variable. You don't want to keep reading the same data over and over.
Your SQL may seem simple but it has hard coded values. It may take more code to make the program data driven but in the end it will be worth the effort.
When you get ride of all the loopy parts and start using the meta data properly and have the results data drive you can then wrap it all up into a macro that will be amazingly powerful, easy to use and worth all the effort.
Hi Data_null_, thanks for your comment...about your last comment
1. Please could you explain what are the loopy parts are in my code?...sorry but, I don't understand...I understand loop as a (do i=1 to x end) but, I would like to know
what you mean by loop in my code.
2. meta data?...again, what does "meta data" mean?
3. and one last question ...wrap it all up in a macro that will be amazing?...sounds good but I would like an example to understand it better.
Thank you very much for your time.
V
michtka wrote:
Hi Data_null_, thanks for your comment...about your last comment
1. Please could you explain what are the loopy parts are in my code?...sorry but, I don't understand...I understand loop as a (do i=1 to x end) but, I would like to know
what you mean by loop in my code.
%transpos(in=h,name=newhstat,var=Height);
%transpos(in=a,name=newastat,var=Age);
%transpos(in=w,name=newwstat,var=Weight);
This is loopy. You read all the input data for each variable. You need to use ONE call to PROC SUMMARY to get summary stats for all variable at one time. Similar for discrete variable do them all at once.
2. meta data?...again, what does "meta data" mean? LABELS and FORMATS3. and one last question ...wrap it all up in a macro that will be amazing?...sounds good but I would like an example to understand it better. You code is not ready to become a macro yet.
Thank you very much for your time.
V
Interesting...thanks.
For example, if you see the macro %tablesc for the discrete variables:
proc sql: I summarize the discrete variable (drug, placebo and total) in a table&num (being num the parameter I change in the macro)
dataset: I change from numeric to character format in a tablec&num
proc sql: I add a new record (header of the tables)
I think, maybe is a long code, but it is clear what I am doing.
Maybe you are right, but I do not find at the moment a more easy way to make (for example) this discrete table.
For discrete variables considers and approach that uses the output from PROC FREQ.
data class;
set sashelp.class;
trt = rantbl(123,.5);
run;
ods output CrossTabFreqs=Counts;
proc freq data=class;
tables trt*(age sex);
run;
ods output close;
proc print data=counts;
run;
Dear data_null_
I know what you mean...but I prefer use proc sql for this demog table (placebo drug total).
Just now, I understood to use it for summarizing avoiding (proc sort proc freq and proc transpose).
I know these two procedure are more intuitive to visualize, but proc sql make the job in a mechanical form.
Thank you for your help. I will keep in mine your expert terminology.
Thanks,
V
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.