Hi Everyone,
I would like to create the following excel table from the output listing from proc surveyreg. I have code for a partial table. I am including what I want and what I have. Please help. What I want is three things:
1) Put stars to indicate significance level on the coefficient values. Statistical significance at 10%, 5%, and 1% is denoted by *, **, and ***, respectively.
2) Put Adjusted R-square in the bottom row.
3) Put number of clusters in the bottom row.
Please help in modifying the following to get what I want. Here is an example of what I want:
Variable | Dependent Variable 1 | Dependent Variable 2 |
Independent Variable 2 | 0.534*** | 0.712*** |
(0.071) | (0.030) | |
Independent Variable 3 | 0.451** | -0.382 |
(0.369) | (0.148) | |
Independent Variable 4 | 0.078*** | 0.091*** |
(0.021) | (0.011) | |
Independent Variable 5 | 0.374*** | -0.147*** |
(0.056) | (0.022) | |
Independent Variable 6 | 1.420*** | -0.605* |
(0.364) | (0.161) | |
Independent Variable 7 | 0.713*** | 0.713*** |
(0.034) | (0.019) | |
Adjusted R-Square | 0.425 | 0.241 |
Number of Clusters | 1,343 | 1,343 |
I have the following code that I use:
proc format;
picture estimatef (round)
low - < 0 = ' 9.999' (prefix='-')
0 <- high=' 9.999'
.=' ';
picture stderrf (round)
low-high=' 9.999)' (prefix='(')
.=' ';
run;
ods output parameterestimates (persist) = t1;
proc surveyreg data = e2.h1 ;
cluster id;
class industry year;
model dv1 = iv2 iv3 iv4 iv5 iv6 iv7 / noint ADJRSQ solution;
run;
ods output close;
data t1; set t1;
model = 'Dependent Variable 1 ';
run;
ods output parameterestimates (persist) = t2;
proc surveyreg data = e2.h1 ;
cluster id;
class industry year;
model dv2 = iv2 iv3 iv4 iv5 iv6 iv7 / noint ADJRSQ solution;
run;
ods output close;
data t2; set t2;
model = 'Dependent Variable 2 ';
run;
data t; set t1 t2;
run;
/*using ODS to transfer it to excel*/
ods tagsets.excelxp
file='C:\Users\Desktop\finalresults2format.xls'
style=minimal
options (orientation = 'landscape'
fittopage = 'yes'
pages_fitwidth = '1'
pages_fitheight = '100' sheet_interval='none');
proc tabulate data=t noseps;
class model parameter / order=freq;
var estimate stderr;
table parameter=' '*(estimate =' '*sum=' '*F=estimatef.
stderr=' '*sum=' '*F=stderrf.),
model=' '
/ box=[label="Variable"] rts=25 row=float misstext=' ';
run;
The report is not a quick 'one off' due to several factors:
The output tables can't be combined to be in a shape that tabulate can perform as expected. TABULATE is nice, in so far as:
The downside of tabulate is that a cell can display only
A Proc REPORT step can be coded to have a COMPUTE block that computes a complex value rendering based on more than one column (such as aforementioned estimate and probt). But the data alignments are still a little off and each column would need a CALL DEFINE to render the numbers according the the row specific format.
The tricky part of your report is that it is showing the estimate table output (which has estimate, stderr and probt in a single row) in a row-wise pivot fashion (estimate on one row and stderr on next row) stacked with rows from FitStatistic and DesignSummary output tables.
One way to deal with all the issues is take them on one by one in a series of steps in which you can control the data shape and cell value renderings and do a final simple PRINT or REPORT.
Example:
Fake data
data have(drop=_:); call streaminit(1234); length industry $25; _n_=-1; do industry = 'Mining', 'Automotive', 'Pharmaceutical'; _n_+1;_m_=-1; do year = 2017 to 2019; _m_+1; do id = 1 to 1221; array iv iv1-iv7; do over iv; iv = ceil(rand('norm', _n_*3+_m_, 1)); if iv < 1 or iv > 9 then iv = .; end; dv1 = iv1 * 1.15 + iv2 / 2 + iv3 * (1.00+rand('uniform',1)) - iv4 / 3 + iv5 * 2 - iv6 / (0.01+rand('uniform',3)) - iv7 * (1.00+rand('uniform')/7) ; dv2 = iv1/1 + iv2/2 - iv3/(0.01+rand('uniform',3)) - iv4/(3.00+rand('uniform',4)) + iv5/1.5 - iv6/2 - iv7/2 ; output; end; end; end; format dv: 9.2; run;
Custom formats and regressions
proc format; picture estimatef (round) low - < 0 = ' 9.999' (prefix='-') 0 <- high=' 9.999' .=' '; picture stderrf (round) low-high=' 9.999)' (prefix='(') .=' '; value probtstars 0 - 0.01 = ' ***' 0.01 <- 0.05 = ' **' 0.05 <- 0.10 = ' *' other = ' !!!'; run; ods output ParameterEstimates (persist) = est_dv1 DesignSummary (persist) = smy_dv1 FitStatistics (persist) = fit_dv1 ; proc surveyreg data = HAVE ; cluster id; class industry year; model dv1 = iv2 iv3 iv4 iv5 iv6 iv7 / noint ADJRSQ solution; run; ods output close; ods output ParameterEstimates (persist) = est_dv2 DesignSummary (persist) = smy_dv2 FitStatistics (persist) = fit_dv2 ; proc surveyreg data = HAVE ; cluster id; class industry year; model dv2 = iv2 iv3 iv4 iv5 iv6 iv7 / noint ADJRSQ solution; run; ods output close;
Computing formatted renderings of estimate and stderr (instead of TABULATE *F=...)
data est_dv1; set est_dv1; estimate_fmt = put(estimate,estimatef.) || put(probt,probtstars.); stderr_fmt = put(stderr,stderrf.); run; data est_dv2; set est_dv2; estimate_fmt = put(estimate,estimatef.) || put(probt,probtstars.); stderr_fmt = put(stderr,stderrf.); run;
Reshaping estimates (row-wise transpose), stacking with other regression output and their computed renderings (PUT(...)) for each modeled dependent variable
proc transpose data=est_dv1 out=est_T_dv1; by parameter; var estimate_fmt stderr_fmt; run; proc transpose data=est_dv2 out=est_T_dv2; by parameter; var estimate_fmt stderr_fmt; run; data t1 (keep=model var _name_ col1 row); model = 'dv1'; length var label1 $32; set est_T_dv1 fit_dv1 smy_dv1; var = coalesceC (parameter, label1); if label1 =: 'Adj' then col1 = put (nvalue1, 5.3); if label1 =: 'Num' then col1 = left(put (nvalue1, comma12.)); if not missing(parameter) or label1 in ('Adjusted R-Square', 'Number of Clusters' ); row+1; run; data t2 (keep=model var _name_ col1 row); model = 'dv2'; length var label1 $32; set est_T_dv2 fit_dv2 smy_dv2; var = coalesceC (parameter, label1); if label1 =: 'Adj' then col1 = put (nvalue1, 5.3); if label1 =: 'Num' then col1 = left(put (nvalue1, comma12.)); if not missing(parameter) or label1 in ('Adjusted R-Square', 'Number of Clusters' ); row+1; run; data t; set t1 t2; run;
Transposing again to get one column per modeled variable
proc sort data=t; by row model; run; proc transpose data=t out=report; by row var _name_; id model; var col1; run;
and lastly a REPORT with a /order to hide repeated var values. ODS Excel instead of tagsets.excelxp
ods excel file='surveyreg-report.xlsx'; ods html file='surveyreg-report.html'; ods listing; proc report data=report(drop=row _name_); define var / order order=data; run; ods _all_ close;
Excel output
"Statistical significance at 10%, 5%, and 1% is denoted by *, **, and ***, respectively."
Suppose you have
Tests of Model Effects Effect Num DF F Value Pr > F Model 6 8052.07 <.0001 iv2 1 16970.4 <.0001 iv3 1 35.43 <.0001 iv4 1 577.90 <.0001 iv5 1 5.53 0.0207 iv6 1 27.56 <.0001 iv7 1 83.00 <.0001 NOTE: The denominator degrees of freedom for the F tests is 99.
How do your star values 10% 5% 1% correspond to Pr > F values ?
Hi Richard,
To answer your question from your example,
The estimates of iv2, iv3, iv4, iv6, and iv7 will have three (***) star.
The estimate of iv5 will have two (**) star.
The report is not a quick 'one off' due to several factors:
The output tables can't be combined to be in a shape that tabulate can perform as expected. TABULATE is nice, in so far as:
The downside of tabulate is that a cell can display only
A Proc REPORT step can be coded to have a COMPUTE block that computes a complex value rendering based on more than one column (such as aforementioned estimate and probt). But the data alignments are still a little off and each column would need a CALL DEFINE to render the numbers according the the row specific format.
The tricky part of your report is that it is showing the estimate table output (which has estimate, stderr and probt in a single row) in a row-wise pivot fashion (estimate on one row and stderr on next row) stacked with rows from FitStatistic and DesignSummary output tables.
One way to deal with all the issues is take them on one by one in a series of steps in which you can control the data shape and cell value renderings and do a final simple PRINT or REPORT.
Example:
Fake data
data have(drop=_:); call streaminit(1234); length industry $25; _n_=-1; do industry = 'Mining', 'Automotive', 'Pharmaceutical'; _n_+1;_m_=-1; do year = 2017 to 2019; _m_+1; do id = 1 to 1221; array iv iv1-iv7; do over iv; iv = ceil(rand('norm', _n_*3+_m_, 1)); if iv < 1 or iv > 9 then iv = .; end; dv1 = iv1 * 1.15 + iv2 / 2 + iv3 * (1.00+rand('uniform',1)) - iv4 / 3 + iv5 * 2 - iv6 / (0.01+rand('uniform',3)) - iv7 * (1.00+rand('uniform')/7) ; dv2 = iv1/1 + iv2/2 - iv3/(0.01+rand('uniform',3)) - iv4/(3.00+rand('uniform',4)) + iv5/1.5 - iv6/2 - iv7/2 ; output; end; end; end; format dv: 9.2; run;
Custom formats and regressions
proc format; picture estimatef (round) low - < 0 = ' 9.999' (prefix='-') 0 <- high=' 9.999' .=' '; picture stderrf (round) low-high=' 9.999)' (prefix='(') .=' '; value probtstars 0 - 0.01 = ' ***' 0.01 <- 0.05 = ' **' 0.05 <- 0.10 = ' *' other = ' !!!'; run; ods output ParameterEstimates (persist) = est_dv1 DesignSummary (persist) = smy_dv1 FitStatistics (persist) = fit_dv1 ; proc surveyreg data = HAVE ; cluster id; class industry year; model dv1 = iv2 iv3 iv4 iv5 iv6 iv7 / noint ADJRSQ solution; run; ods output close; ods output ParameterEstimates (persist) = est_dv2 DesignSummary (persist) = smy_dv2 FitStatistics (persist) = fit_dv2 ; proc surveyreg data = HAVE ; cluster id; class industry year; model dv2 = iv2 iv3 iv4 iv5 iv6 iv7 / noint ADJRSQ solution; run; ods output close;
Computing formatted renderings of estimate and stderr (instead of TABULATE *F=...)
data est_dv1; set est_dv1; estimate_fmt = put(estimate,estimatef.) || put(probt,probtstars.); stderr_fmt = put(stderr,stderrf.); run; data est_dv2; set est_dv2; estimate_fmt = put(estimate,estimatef.) || put(probt,probtstars.); stderr_fmt = put(stderr,stderrf.); run;
Reshaping estimates (row-wise transpose), stacking with other regression output and their computed renderings (PUT(...)) for each modeled dependent variable
proc transpose data=est_dv1 out=est_T_dv1; by parameter; var estimate_fmt stderr_fmt; run; proc transpose data=est_dv2 out=est_T_dv2; by parameter; var estimate_fmt stderr_fmt; run; data t1 (keep=model var _name_ col1 row); model = 'dv1'; length var label1 $32; set est_T_dv1 fit_dv1 smy_dv1; var = coalesceC (parameter, label1); if label1 =: 'Adj' then col1 = put (nvalue1, 5.3); if label1 =: 'Num' then col1 = left(put (nvalue1, comma12.)); if not missing(parameter) or label1 in ('Adjusted R-Square', 'Number of Clusters' ); row+1; run; data t2 (keep=model var _name_ col1 row); model = 'dv2'; length var label1 $32; set est_T_dv2 fit_dv2 smy_dv2; var = coalesceC (parameter, label1); if label1 =: 'Adj' then col1 = put (nvalue1, 5.3); if label1 =: 'Num' then col1 = left(put (nvalue1, comma12.)); if not missing(parameter) or label1 in ('Adjusted R-Square', 'Number of Clusters' ); row+1; run; data t; set t1 t2; run;
Transposing again to get one column per modeled variable
proc sort data=t; by row model; run; proc transpose data=t out=report; by row var _name_; id model; var col1; run;
and lastly a REPORT with a /order to hide repeated var values. ODS Excel instead of tagsets.excelxp
ods excel file='surveyreg-report.xlsx'; ods html file='surveyreg-report.html'; ods listing; proc report data=report(drop=row _name_); define var / order order=data; run; ods _all_ close;
Excel output
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.