Programming the statistical procedures from SAS

Stata survey coding to SAS

Reply
Regular Contributor
Posts: 204

Stata survey coding to SAS

Hi everyone.


Hope you all had a great Christmas and New Year.

I wish to translate the below Stata coding to SAS and I'm wondering if I have the SAS coding right because the result from Stata differs from SAS's.

Stata:

svyset _n,  poststrata(poststrata) postweight(aweight)

svy, subpop(european): logistic case  i.age_cat i.interviewmethod i.status i.deprivation

SAS:

proc surveylogistic data=bc;

strata poststrata;

weight aweight;

domain european;

class age_cat(ref="0") interviewmethod(ref="3") deprivation(ref="0") / param=ref;

model case(event="1") = age_cat  interviewmethod status deprivation;

run;

Your help is greatly appreciated. Have a good day.

Regular Contributor
Posts: 152

Re: Stata survey coding to SAS

By default, STATA's method of creating indicator ("dummy") variables for a categorical variable is to create such variables for all levels of the categorical variable and to omit the indicator variable corresponding to the smallest level of the categorical variable.  One of your categorical variables in the SAS code, INTERVIEWMETHOD, has a reference level [=3] that is probably NOT the smallest level for this variable [probably something less than 3].  Therefore, to translate the STATA coding to SAS, specify the smallest level for INTERVIEWMETHOD as its reference level in the CLASS statement.

Or, you can study the STATA documentation to change the default behavior for STATA in creating indicator variables to specify instead level 3 of INTERVIEWMETHOD as its reference level.

Regular Contributor
Posts: 204

Re: Stata survey coding to SAS

Hi 1zmm, thanks for your reply.

I'm interested to know the effect of deprivation while adjusting for the rest of variables so by changing the reference group of INTERVIEWMETHOD is not necessary.

Regular Contributor
Posts: 152

Re: Stata survey coding to SAS

Your original question asked why the results from SAS were not the same as those from STATA.  Since you did not show the results from either program, all I could do was suggest one possible reason why these results may differ.  Although changing the reference group for INTERVIEWMETHOD should not change the overall effect of deprivation after adjustment, if you had changed this reference group, did the results from SAS still differ from those of STATA?

Regular Contributor
Posts: 204

Re: Stata survey coding to SAS

The SAS output for

                  Domain Summary

Number of Observations                                4342

Number of Observations in Domain               264

Number of Observations not in Domain         4078

Sum of Weights in Domain                            267.82300

             Variance Estimation

Method                                             Taylor Series

Variance Adjustment                       Degrees of Freedom (DF)

Number of Observations Read        4342

Number of Observations Used        4331

Sum of Weights Read                     267.823

Sum of Weights Used                     261.643

I thought the "Sum of Weights in Domain" should be equal to "Number of Observations in Domain".

and the "Sum of Weights Used" should be equal to the number of observations with complete data in the surveylogistic regression.

Can anyone please shed some light on this? Perhaps I did it wrong?

Super thanks in advance!!

Regular Contributor
Posts: 152

Re: Stata survey coding to SAS

Were the weights integral weights equal to 1.00?  If not, why would you expected a weighted sum equal to an integral number of observations particularly in a DOMAIN analysis where you are studying a subgroup of the entire sample?

Regular Contributor
Posts: 204

Re: Stata survey coding to SAS

Hi 1zmm. I really appreciate your reply.

Here is a fraction of the Stata output.

. svy, subpop(european): logistic case  i.age_cat i.interviewmethod i.status i.deprivation

(running logistic on estimation sample)

Survey: Logistic regression

Number of strata   =            1              Number of obs       =           4337

Number of PSUs   =      4337             Population size      =         14.996

N. of poststrata     =          17              Subpop. no. of obs =            259

                                                            Subpop. size          = 3.9069105

                                                            Design df                =          4336

                                                            F(  10,   4327)         =           1.92

                                                            Prob > F                 =        0.0382

I have a total of 4342 individuals and 11 with incomplete data for the logistic regression. Total European is 264 and 5 with incomplete data and hence Subpop. no. of obs=259.

I don't understand why the discrepancy in result between SAS and Stata. I use the same data in both Stata and SAS. What have I done wrong?

Your help is greatly appreciated.

Regular Contributor
Posts: 152

Re: Stata survey coding to SAS

I think the discrepancy is due to your use in the STATA SVYSET command of the options, POSTSTRATA and POSTWEIGHT.  These options are used to adjust the respondent sampling weights so that they sum to the population sizes within each poststratum to account for nonresponse and underrepresented groups in the population (cf., the STATA documentation).  These poststratification strata differ from the "design" strata used in your complex sample survey and in the SAS STRATA statement.  In SAS, these poststratification adjustments are usually performed beforehand on the respondent sampling weights so that these poststratified, adjusted respondent sampling weights are used in the SAS WEIGHT statement.

Note that your STATA output implies that the DESIGN of your survey has only one stratum and 4,337 primary sampling units.  STATA lists the number of postrata as 17.  However, since your SAS syntax uses the variable, POSTSTRATA, as the argument of its STRATA statement, the DESIGN of your survey in SAS implies 17 strata and 4,337 primary sampling units.  Thus, STATA "sees" only one stratum in your sample design, and SAS "sees" 17 strata in your sample design.

To make the STATA output conform with the SAS output, change the STATA SVYSET command to specify 17 strata and 4,337 primary sampling units:

    SVYSET, CLEAR

     SVYSET _n [pweight=aweight], strata(poststrata)  

Regular Contributor
Posts: 204

Re: Stata survey coding to SAS

Hi 1zmm. Thanks again for taking the time to read my post.


Yes, you are right. The result'll be the same for both SAS and STATA if specifying SVYSET _n [pweight=aweight], strata(poststrata).

I misunderstood the SAS's strata with Stata's poststrata command.

Actually I wish to perform poststratification adjustment for non-response in SAS.

I have the variable

- weight calculated for each stratum of  ethnicity(3) * deprivation(5), by dividing the expected deprivation distribution of each ethnic group by the observed deprivation distribution from our study.

- strata for each stratum of  ethnicity(3) * deprivation(5)


However  I can't find any example online on how to do it in SAS.

Perhaps you can guide me?

Thanks in advance.


Regular Contributor
Posts: 152

Re: Stata survey coding to SAS

Check the following reference at Lex Jansen's Internet site:

  http://www.lexjansen.com/wuss/2012/162.pdf

This reference shows several methods on how to adjust the sampling weights from the observed responses in your survey to conform with the population totals you want to poststratify to.  Poststratification implies an external standard population that provides these population totals, but such a population is not necessary to adjust for survey nonresponse.  Then use these new poststratified weights in the SAS survey analysis procedures.


Regular Contributor
Posts: 204

Re: Stata survey coding to SAS

Thanks for link. I'll go and read it. Have a good day.

Ask a Question
Discussion stats
  • 10 replies
  • 424 views
  • 0 likes
  • 2 in conversation