Hello programmers,
I would like to create a manuscript table without using a proc report. I've made progress but I'd be glad to have someone look at the code. The dataset is in excel attached to this message. I would want the final table to look like below.
Please I'd be glad if there's anyone who can help me complete my code so that I can create the table below.
data Aa; set A;
array names $14 n1-n6
('A' 'B' 'C' 'D'
'E' 'F' );
array drugs A B C D E F ;
do over drugs;
d_value = drugs;
d_order = _I_ ;
d_name = names;
output; end;
proc print;
var id d_order d_name d_value
stroke hrt_failure ;
run;
data Aaa; set Aa;
run;
data A1; set Aaa;
output;
female = 99 ; * everyone;
output;
proc format;
value femalefm
0 ='Male'
1 ='Female'
99 = 'All patients';
;
proc sort;
by female d_order d_name ;
proc freq noprint data = A1;
by female d_order d_name;
table d_value*stroke/
out = Dataset1 outpct cmh;
output out = Dataset3 cmh ;
proc freq noprint data = A1 ;
by female d_order d_name;
table d_value*hrt_failure/out= Dataset2
outpct cmh;
output out = Dataset4 cmh ;
run;
/*So i am trying to create the table below for each gender.
That is for female=0,1 and 99. So that i have the freqency and odds ratio for each of the drugs (A B C D E F)*/
/*so i merged dataset 1, 2, 3 and 4. Data set 1 and 2 have the row and col % for stroke and
hrt_failure respectively, while dataset 3 and 4 how the RR and 95% CI for stroke and hrt_failure
respectively */
data dataset1x; set dataset1; drop PCT_ROW PERCENT; run;
data dataset2x;set dataset2;drop PCT_ROW PERCENT; run;
data dataset3x;set dataset3; (keep =female d_name d_order _MHOR_ L_MHOR U_MHOR);run;
data dataset4x;set out4 (keep =female d_name d_order _MHOR_ L_MHOR U_MHOR);run;
proc sort data = dataset1x; by d_order d_name ;run;
proc sort data = dataset2x; by d_order d_name ;run;
proc sort data = dataset3x; by d_order d_name ;run;
proc sort data = dataset4x; by d_order d_name ;run;
data merged_1_4;
merge dataset1x dataset2x dataset3x dataset4x;
by d_order d_name female; run;
/*i am literally stuck here*/
data merged_1_4A;
set merged_1_4;
options nodate nonumber;
file print notitles ;
if _N_ =1 then do;
put @45 'female';
put @1 'stroke' @45 'PCT_COL' 4.
@50 '(' PCT_COL 5.2 ')' @60 '_MHOR_' @70 '(95% CI)';
put @1 'hrt_failure' @45 'PCT_COL' 4.
@50 '(' PCT_COL1 5.2 ')' @60 '_MHOR_1' @70 '(95% CI)';
end; run;
Stroke | No Stroke | Hrt_failure | No hrt_failure | |||
drug | count & pct of stroke group who took the drug | count &pct of the no-stroke group who took the drug | odds ratio between Stroke and drug with 95% CI | count & pct of the Hrt_failure group who took the drug | count & pct of the no- hrt_failure group who took the drug | odds ratio between hrt_failure and drug with 95% CI |
A | XXXX (xx.xx) | XXXX (xx.xx) | X.xx (x.xx, x.xx) | XXXX (xx.xx) | XXXX (xx.xx) | X.xx (x.xx, x.xx) |
etc. | ||||||
… | ||||||
F |
Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.
Some generic comments:
Waste of clock cycles:
data Aaa; set Aa;
run;
The AAA set is used ONLY in
data A1; set Aaa;
output;
female = 99 ; * everyone;
output;
Might as well use AA on the set statement.
Multiple uses of creating additional unneeded data sets can lead to excessive run times as each record gets processed multiple times instead of once. In some environments you will be limited in disk space by an Admin setting. Sets like this count against your limit while serving no purpose.
Adding an additional level to a variable just to create an "all records" is also a suspect activity in general. Proc Tabulate does a very simple job with the ALL instruction for creating summaries for all values of a categorical variable:
You should have the SASHELP.CLASS data set to run this code as an example.
proc tabulate data =sashelp.class; class sex age; table sex all='All sexes', age all='All ages' ; run;
Proc Report does summaries in columns using the Break instruction.
Then your final output with multiple values in a cell like your "X.xx (x.xx, x.xx) " are going to be a tad ugly in terms of aligned values unless every single value in each cell of the column has values of very similar ranges.
Consider your process as a whole for what will need to be done when you need to add Race as a group next year and need to have the gender as you have it now, a race only summary and gender and race.
Then how would you modify that to incorporate Hispanic Ethnicity after that, or a geography such as State, or county (or better yet both)? You are going to create so many data sets that you will get lost keeping track of them, or which one holds what.
One project I worked on with school data required summaries at just about every combination of : Management region, county, school district, school name, grade, school type (public private), summarizing numbers and percentages of students vaccinated for seven diseases.
Project used 4 data sets to create 2,000+ pages of report. 1 set was the raw data read by the provider, 1 was that "cleaned up" after identifying and getting corrected totals for things like enrollments and immunization incomplete values and misspelled schools and such. Third was a summary data set used to get the counts of all the combinations needed (Prod summary is your friend). The fourth set used the counts to create percentages of enrolled students. Then specific records were displayed in the report.
The third year I was doing this report I had to add another analysis group level (a third level for school type).
My code did not require any changes to get report tables separated or summarized for that new type. Schools were changed every year. New school no code change. School removed, no code changed. School districts combined into a single district, no code change (for a single year; cross year reports would require making sure the identifications are the same for each reported year). When two immunizations were added the changes were 1) in the read program to get the new values, added to the summarization step (add names to a Var statement), and in the 4th step created the new calculated variables. Since these were done in ARRAYS of like used variables that meant adding 3 additional variables to the array definitions and then providing new LABELS for those variables.
Why do you not want to use proc report?
Is PROC TABULATE an option? Going down to a data step is doable but really not a great idea when there are more automated options. It's more error prone, more difficult to maintain.
Please post your data in a form that's a bit easier for us to use by following the instructions here.
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.
Some generic comments:
Waste of clock cycles:
data Aaa; set Aa;
run;
The AAA set is used ONLY in
data A1; set Aaa;
output;
female = 99 ; * everyone;
output;
Might as well use AA on the set statement.
Multiple uses of creating additional unneeded data sets can lead to excessive run times as each record gets processed multiple times instead of once. In some environments you will be limited in disk space by an Admin setting. Sets like this count against your limit while serving no purpose.
Adding an additional level to a variable just to create an "all records" is also a suspect activity in general. Proc Tabulate does a very simple job with the ALL instruction for creating summaries for all values of a categorical variable:
You should have the SASHELP.CLASS data set to run this code as an example.
proc tabulate data =sashelp.class; class sex age; table sex all='All sexes', age all='All ages' ; run;
Proc Report does summaries in columns using the Break instruction.
Then your final output with multiple values in a cell like your "X.xx (x.xx, x.xx) " are going to be a tad ugly in terms of aligned values unless every single value in each cell of the column has values of very similar ranges.
Consider your process as a whole for what will need to be done when you need to add Race as a group next year and need to have the gender as you have it now, a race only summary and gender and race.
Then how would you modify that to incorporate Hispanic Ethnicity after that, or a geography such as State, or county (or better yet both)? You are going to create so many data sets that you will get lost keeping track of them, or which one holds what.
One project I worked on with school data required summaries at just about every combination of : Management region, county, school district, school name, grade, school type (public private), summarizing numbers and percentages of students vaccinated for seven diseases.
Project used 4 data sets to create 2,000+ pages of report. 1 set was the raw data read by the provider, 1 was that "cleaned up" after identifying and getting corrected totals for things like enrollments and immunization incomplete values and misspelled schools and such. Third was a summary data set used to get the counts of all the combinations needed (Prod summary is your friend). The fourth set used the counts to create percentages of enrolled students. Then specific records were displayed in the report.
The third year I was doing this report I had to add another analysis group level (a third level for school type).
My code did not require any changes to get report tables separated or summarized for that new type. Schools were changed every year. New school no code change. School removed, no code changed. School districts combined into a single district, no code change (for a single year; cross year reports would require making sure the identifications are the same for each reported year). When two immunizations were added the changes were 1) in the read program to get the new values, added to the summarization step (add names to a Var statement), and in the 4th step created the new calculated variables. Since these were done in ARRAYS of like used variables that meant adding 3 additional variables to the array definitions and then providing new LABELS for those variables.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.