Solved: Creating results table from Proc Survereg

therock · Posted 03-27-2020 01:19 PM

Hi Everyone,

I would like to create the following excel table from the output listing from proc surveyreg. I have code for a partial table. I am including what I want and what I have. Please help. What I want is three things:

1) Put stars to indicate significance level on the coefficient values. Statistical significance at 10%, 5%, and 1% is denoted by *, **, and ***, respectively.

2) Put Adjusted R-square in the bottom row.

3) Put number of clusters in the bottom row.

Please help in modifying the following to get what I want. Here is an example of what I want:

Variable	Dependent Variable 1	Dependent Variable 2
Independent Variable 2	0.534***	0.712***
	(0.071)	(0.030)
Independent Variable 3	0.451**	-0.382
	(0.369)	(0.148)
Independent Variable 4	0.078***	0.091***
	(0.021)	(0.011)
Independent Variable 5	0.374***	-0.147***
	(0.056)	(0.022)
Independent Variable 6	1.420***	-0.605*
	(0.364)	(0.161)
Independent Variable 7	0.713***	0.713***
	(0.034)	(0.019)
Adjusted R-Square	0.425	0.241
Number of Clusters	1,343	1,343

I have the following code that I use:

proc format;     
	picture estimatef (round)
		low - < 0 = ' 9.999' (prefix='-')
		0  <- high=' 9.999'                                 
            .=' ';    
	picture stderrf (round)       
     	low-high=' 9.999)' (prefix='(')                                
            .=' ';                                                  
run;


ods output parameterestimates (persist) = t1;                        
                                                                    
proc surveyreg data = e2.h1 ; 
cluster id;
class industry year;
model dv1 = iv2 iv3 iv4 iv5 iv6 iv7 / noint ADJRSQ solution;  
run;  
                                                                    
ods output close;                                                   
                                                                    


data t1; set t1;
model = 'Dependent Variable 1       ';
run;

ods output parameterestimates (persist) = t2;                        
                                                                    

proc surveyreg data = e2.h1 ; 
cluster id;
class industry year;
model dv2 = iv2 iv3 iv4 iv5 iv6 iv7 / noint ADJRSQ solution;  
run;  
                                                                    
ods output close;                                                   
                                                                    
data t2; set t2;
model = 'Dependent Variable 2       ';
run;



data t; set t1 t2;
run;


/*using ODS to transfer it to excel*/
ods tagsets.excelxp
file='C:\Users\Desktop\finalresults2format.xls'
style=minimal
options (orientation = 'landscape'
fittopage = 'yes'
pages_fitwidth = '1'
pages_fitheight = '100' sheet_interval='none');

proc tabulate data=t noseps;      
  class model parameter / order=freq;                    
  var estimate stderr; 
  table parameter='  '*(estimate =' '*sum=' '*F=estimatef.                          
                     stderr=' '*sum=' '*F=stderrf.),                
         model='                               '                                                  
          / box=[label="Variable"] rts=25 row=float misstext=' '; 
run;

RichardDeVen · Posted 04-01-2020 11:56 AM

The report is not a quick 'one off' due to several factors:

Each regression has output tables that don't 'align' for the desired report
Two regressions

The output tables can't be combined to be in a shape that tabulate can perform as expected. TABULATE is nice, in so far as:

when a variable is placed in a table the format can be specified.

The downside of tabulate is that a cell can display only

A class level, class value or aggregation result from values belonging to a dimensional intersection.
The `estimate` you want shown is actually combination of the estimate and probt 'category'. Tabulate can't do that.

A Proc REPORT step can be coded to have a COMPUTE block that computes a complex value rendering based on more than one column (such as aforementioned estimate and probt). But the data alignments are still a little off and each column would need a CALL DEFINE to render the numbers according the the row specific format.

The tricky part of your report is that it is showing the estimate table output (which has estimate, stderr and probt in a single row) in a row-wise pivot fashion (estimate on one row and stderr on next row) stacked with rows from FitStatistic and DesignSummary output tables.

One way to deal with all the issues is take them on one by one in a series of steps in which you can control the data shape and cell value renderings and do a final simple PRINT or REPORT.

Example:

Fake data

data have(drop=_:);
  call streaminit(1234);

  length industry $25;                                          _n_=-1;
  do industry = 'Mining', 'Automotive', 'Pharmaceutical'; _n_+1;_m_=-1;
  do year = 2017 to 2019;                                 _m_+1;
  do id = 1 to 1221;
    array iv iv1-iv7;
    do over iv;
      iv = ceil(rand('norm', _n_*3+_m_, 1));
      if iv < 1 or iv > 9 then iv = .;
    end;
    dv1 = iv1 * 1.15
        + iv2 / 2
        + iv3 * (1.00+rand('uniform',1))
        - iv4 / 3
        + iv5 * 2
        - iv6 / (0.01+rand('uniform',3))
        - iv7 * (1.00+rand('uniform')/7)
        ;
    dv2 = iv1/1
        + iv2/2
        - iv3/(0.01+rand('uniform',3))
        - iv4/(3.00+rand('uniform',4))
        + iv5/1.5
        - iv6/2
        - iv7/2
        ;
    output;
  end;
  end;
  end;
  format dv: 9.2;
run;

Custom formats and regressions

proc format;     
	picture estimatef (round)
		low - < 0 = ' 9.999' (prefix='-')
		0  <- high=' 9.999'                                 
            .=' ';    
	picture stderrf (round)       
     	low-high=' 9.999)' (prefix='(')                                
            .=' ';                                                  
  value probtstars
      0 - 0.01 = ' ***'
      0.01 <- 0.05 = ' **'
      0.05 <- 0.10 = ' *'
      other = ' !!!';
run;


ods output 
  ParameterEstimates (persist) = est_dv1
  DesignSummary (persist)      = smy_dv1
  FitStatistics (persist)      = fit_dv1
;
 
proc surveyreg data = HAVE ; 
  cluster id;
  class industry year;
  model dv1 = iv2 iv3 iv4 iv5 iv6 iv7 / noint ADJRSQ solution;  
run;  
                                                                    
ods output close;                                                   
                                                                    

ods output 
  ParameterEstimates (persist) = est_dv2
  DesignSummary (persist)      = smy_dv2
  FitStatistics (persist)      = fit_dv2
;

proc surveyreg data = HAVE ; 
  cluster id;
  class industry year;
  model dv2 = iv2 iv3 iv4 iv5 iv6 iv7 / noint ADJRSQ solution;  
run;  
                                                                    
ods output close;

Computing formatted renderings of estimate and stderr (instead of TABULATE *F=...)

data est_dv1;
  set est_dv1;
  estimate_fmt = put(estimate,estimatef.) || put(probt,probtstars.);
  stderr_fmt   = put(stderr,stderrf.);
run;

data est_dv2;
  set est_dv2;
  estimate_fmt = put(estimate,estimatef.) || put(probt,probtstars.);
  stderr_fmt   = put(stderr,stderrf.);
run;

Reshaping estimates (row-wise transpose), stacking with other regression output and their computed renderings (PUT(...)) for each modeled dependent variable

proc transpose data=est_dv1 out=est_T_dv1;
  by parameter;
  var estimate_fmt stderr_fmt;
run;

proc transpose data=est_dv2 out=est_T_dv2;
  by parameter;
  var estimate_fmt stderr_fmt;
run;

data t1 (keep=model var _name_ col1 row);
  model = 'dv1';

  length var label1 $32;

  set est_T_dv1 fit_dv1 smy_dv1;

  var = coalesceC (parameter, label1);
  
  if label1 =: 'Adj' then col1 = put (nvalue1, 5.3);
  if label1 =: 'Num' then col1 = left(put (nvalue1, comma12.));

  if not missing(parameter) or label1 in ('Adjusted R-Square', 'Number of Clusters' );

  row+1;
run;

data t2 (keep=model var _name_ col1 row);
  model = 'dv2';

  length var label1 $32;

  set est_T_dv2 fit_dv2 smy_dv2;

  var = coalesceC (parameter, label1);

  if label1 =: 'Adj' then col1 = put (nvalue1, 5.3);
  if label1 =: 'Num' then col1 = left(put (nvalue1, comma12.));

  if not missing(parameter) or label1 in ('Adjusted R-Square', 'Number of Clusters' );

  row+1;
run;

data t;
  set t1 t2;
run;

Transposing again to get one column per modeled variable

proc sort data=t;
  by row model;
run;

proc transpose data=t out=report;
  by row var _name_;
  id model;
  var col1;
run;

and lastly a REPORT with a /order to hide repeated var values. ODS Excel instead of tagsets.excelxp

ods excel file='surveyreg-report.xlsx';
ods html file='surveyreg-report.html';
ods listing;

proc report data=report(drop=row _name_);
 define var / order order=data;
run;

ods _all_ close;

Excel output

View solution in original post

RichardDeVen · Posted 03-31-2020 01:55 PM

"Statistical significance at 10%, 5%, and 1% is denoted by *, **, and ***, respectively."

Suppose you have

       Tests of Model Effects

Effect    Num DF    F Value    Pr > F

Model          6    8052.07    <.0001
iv2            1    16970.4    <.0001
iv3            1      35.43    <.0001
iv4            1     577.90    <.0001
iv5            1       5.53    0.0207
iv6            1      27.56    <.0001
iv7            1      83.00    <.0001

NOTE: The denominator degrees of freedom for the F tests is 99.

How do your star values 10% 5% 1% correspond to Pr > F values ?

therock · Posted 03-31-2020 02:31 PM

Hi Richard,

To answer your question from your example,

The estimates of iv2, iv3, iv4, iv6, and iv7 will have three (***) star.

The estimate of iv5 will have two (**) star.

RichardDeVen · Posted 04-01-2020 11:56 AM

The report is not a quick 'one off' due to several factors:

Each regression has output tables that don't 'align' for the desired report
Two regressions

The output tables can't be combined to be in a shape that tabulate can perform as expected. TABULATE is nice, in so far as:

when a variable is placed in a table the format can be specified.

The downside of tabulate is that a cell can display only

A class level, class value or aggregation result from values belonging to a dimensional intersection.
The `estimate` you want shown is actually combination of the estimate and probt 'category'. Tabulate can't do that.

A Proc REPORT step can be coded to have a COMPUTE block that computes a complex value rendering based on more than one column (such as aforementioned estimate and probt). But the data alignments are still a little off and each column would need a CALL DEFINE to render the numbers according the the row specific format.

The tricky part of your report is that it is showing the estimate table output (which has estimate, stderr and probt in a single row) in a row-wise pivot fashion (estimate on one row and stderr on next row) stacked with rows from FitStatistic and DesignSummary output tables.

One way to deal with all the issues is take them on one by one in a series of steps in which you can control the data shape and cell value renderings and do a final simple PRINT or REPORT.

Example:

Fake data

data have(drop=_:);
  call streaminit(1234);

  length industry $25;                                          _n_=-1;
  do industry = 'Mining', 'Automotive', 'Pharmaceutical'; _n_+1;_m_=-1;
  do year = 2017 to 2019;                                 _m_+1;
  do id = 1 to 1221;
    array iv iv1-iv7;
    do over iv;
      iv = ceil(rand('norm', _n_*3+_m_, 1));
      if iv < 1 or iv > 9 then iv = .;
    end;
    dv1 = iv1 * 1.15
        + iv2 / 2
        + iv3 * (1.00+rand('uniform',1))
        - iv4 / 3
        + iv5 * 2
        - iv6 / (0.01+rand('uniform',3))
        - iv7 * (1.00+rand('uniform')/7)
        ;
    dv2 = iv1/1
        + iv2/2
        - iv3/(0.01+rand('uniform',3))
        - iv4/(3.00+rand('uniform',4))
        + iv5/1.5
        - iv6/2
        - iv7/2
        ;
    output;
  end;
  end;
  end;
  format dv: 9.2;
run;

Custom formats and regressions

proc format;     
	picture estimatef (round)
		low - < 0 = ' 9.999' (prefix='-')
		0  <- high=' 9.999'                                 
            .=' ';    
	picture stderrf (round)       
     	low-high=' 9.999)' (prefix='(')                                
            .=' ';                                                  
  value probtstars
      0 - 0.01 = ' ***'
      0.01 <- 0.05 = ' **'
      0.05 <- 0.10 = ' *'
      other = ' !!!';
run;


ods output 
  ParameterEstimates (persist) = est_dv1
  DesignSummary (persist)      = smy_dv1
  FitStatistics (persist)      = fit_dv1
;
 
proc surveyreg data = HAVE ; 
  cluster id;
  class industry year;
  model dv1 = iv2 iv3 iv4 iv5 iv6 iv7 / noint ADJRSQ solution;  
run;  
                                                                    
ods output close;                                                   
                                                                    

ods output 
  ParameterEstimates (persist) = est_dv2
  DesignSummary (persist)      = smy_dv2
  FitStatistics (persist)      = fit_dv2
;

proc surveyreg data = HAVE ; 
  cluster id;
  class industry year;
  model dv2 = iv2 iv3 iv4 iv5 iv6 iv7 / noint ADJRSQ solution;  
run;  
                                                                    
ods output close;

Computing formatted renderings of estimate and stderr (instead of TABULATE *F=...)

data est_dv1;
  set est_dv1;
  estimate_fmt = put(estimate,estimatef.) || put(probt,probtstars.);
  stderr_fmt   = put(stderr,stderrf.);
run;

data est_dv2;
  set est_dv2;
  estimate_fmt = put(estimate,estimatef.) || put(probt,probtstars.);
  stderr_fmt   = put(stderr,stderrf.);
run;

Reshaping estimates (row-wise transpose), stacking with other regression output and their computed renderings (PUT(...)) for each modeled dependent variable

proc transpose data=est_dv1 out=est_T_dv1;
  by parameter;
  var estimate_fmt stderr_fmt;
run;

proc transpose data=est_dv2 out=est_T_dv2;
  by parameter;
  var estimate_fmt stderr_fmt;
run;

data t1 (keep=model var _name_ col1 row);
  model = 'dv1';

  length var label1 $32;

  set est_T_dv1 fit_dv1 smy_dv1;

  var = coalesceC (parameter, label1);
  
  if label1 =: 'Adj' then col1 = put (nvalue1, 5.3);
  if label1 =: 'Num' then col1 = left(put (nvalue1, comma12.));

  if not missing(parameter) or label1 in ('Adjusted R-Square', 'Number of Clusters' );

  row+1;
run;

data t2 (keep=model var _name_ col1 row);
  model = 'dv2';

  length var label1 $32;

  set est_T_dv2 fit_dv2 smy_dv2;

  var = coalesceC (parameter, label1);

  if label1 =: 'Adj' then col1 = put (nvalue1, 5.3);
  if label1 =: 'Num' then col1 = left(put (nvalue1, comma12.));

  if not missing(parameter) or label1 in ('Adjusted R-Square', 'Number of Clusters' );

  row+1;
run;

data t;
  set t1 t2;
run;

Transposing again to get one column per modeled variable

proc sort data=t;
  by row model;
run;

proc transpose data=t out=report;
  by row var _name_;
  id model;
  var col1;
run;

and lastly a REPORT with a /order to hide repeated var values. ODS Excel instead of tagsets.excelxp

ods excel file='surveyreg-report.xlsx';
ods html file='surveyreg-report.html';
ods listing;

proc report data=report(drop=row _name_);
 define var / order order=data;
run;

ods _all_ close;

Excel output

Creating results table from Proc Survereg

Re: Creating results table from Proc Survereg

Re: Creating results table from Proc Survereg

Re: Creating results table from Proc Survereg

Re: Creating results table from Proc Survereg

Creating results table from Proc Survereg

Re: Creating results table from Proc Survereg

Re: Creating results table from Proc Survereg

Re: Creating results table from Proc Survereg

Re: Creating results table from Proc Survereg

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!