BookmarkSubscribeRSS Feed
Miracle
Barite | Level 11

Hi everyone.


Hope you all had a great Christmas and New Year.

I wish to translate the below Stata coding to SAS and I'm wondering if I have the SAS coding right because the result from Stata differs from SAS's.

Stata:

svyset _n,  poststrata(poststrata) postweight(aweight)

svy, subpop(european): logistic case  i.age_cat i.interviewmethod i.status i.deprivation

SAS:

proc surveylogistic data=bc;

strata poststrata;

weight aweight;

domain european;

class age_cat(ref="0") interviewmethod(ref="3") deprivation(ref="0") / param=ref;

model case(event="1") = age_cat  interviewmethod status deprivation;

run;

Your help is greatly appreciated. Have a good day.

10 REPLIES 10
1zmm
Quartz | Level 8

By default, STATA's method of creating indicator ("dummy") variables for a categorical variable is to create such variables for all levels of the categorical variable and to omit the indicator variable corresponding to the smallest level of the categorical variable.  One of your categorical variables in the SAS code, INTERVIEWMETHOD, has a reference level [=3] that is probably NOT the smallest level for this variable [probably something less than 3].  Therefore, to translate the STATA coding to SAS, specify the smallest level for INTERVIEWMETHOD as its reference level in the CLASS statement.

Or, you can study the STATA documentation to change the default behavior for STATA in creating indicator variables to specify instead level 3 of INTERVIEWMETHOD as its reference level.

Miracle
Barite | Level 11

Hi 1zmm, thanks for your reply.

I'm interested to know the effect of deprivation while adjusting for the rest of variables so by changing the reference group of INTERVIEWMETHOD is not necessary.

1zmm
Quartz | Level 8

Your original question asked why the results from SAS were not the same as those from STATA.  Since you did not show the results from either program, all I could do was suggest one possible reason why these results may differ.  Although changing the reference group for INTERVIEWMETHOD should not change the overall effect of deprivation after adjustment, if you had changed this reference group, did the results from SAS still differ from those of STATA?

Miracle
Barite | Level 11

The SAS output for

                  Domain Summary

Number of Observations                                4342

Number of Observations in Domain               264

Number of Observations not in Domain         4078

Sum of Weights in Domain                            267.82300

             Variance Estimation

Method                                             Taylor Series

Variance Adjustment                       Degrees of Freedom (DF)

Number of Observations Read        4342

Number of Observations Used        4331

Sum of Weights Read                     267.823

Sum of Weights Used                     261.643

I thought the "Sum of Weights in Domain" should be equal to "Number of Observations in Domain".

and the "Sum of Weights Used" should be equal to the number of observations with complete data in the surveylogistic regression.

Can anyone please shed some light on this? Perhaps I did it wrong?

Super thanks in advance!!

1zmm
Quartz | Level 8

Were the weights integral weights equal to 1.00?  If not, why would you expected a weighted sum equal to an integral number of observations particularly in a DOMAIN analysis where you are studying a subgroup of the entire sample?

Miracle
Barite | Level 11

Hi 1zmm. I really appreciate your reply.

Here is a fraction of the Stata output.

. svy, subpop(european): logistic case  i.age_cat i.interviewmethod i.status i.deprivation

(running logistic on estimation sample)

Survey: Logistic regression

Number of strata   =            1              Number of obs       =           4337

Number of PSUs   =      4337             Population size      =         14.996

N. of poststrata     =          17              Subpop. no. of obs =            259

                                                            Subpop. size          = 3.9069105

                                                            Design df                =          4336

                                                            F(  10,   4327)         =           1.92

                                                            Prob > F                 =        0.0382

I have a total of 4342 individuals and 11 with incomplete data for the logistic regression. Total European is 264 and 5 with incomplete data and hence Subpop. no. of obs=259.

I don't understand why the discrepancy in result between SAS and Stata. I use the same data in both Stata and SAS. What have I done wrong?

Your help is greatly appreciated.

1zmm
Quartz | Level 8

I think the discrepancy is due to your use in the STATA SVYSET command of the options, POSTSTRATA and POSTWEIGHT.  These options are used to adjust the respondent sampling weights so that they sum to the population sizes within each poststratum to account for nonresponse and underrepresented groups in the population (cf., the STATA documentation).  These poststratification strata differ from the "design" strata used in your complex sample survey and in the SAS STRATA statement.  In SAS, these poststratification adjustments are usually performed beforehand on the respondent sampling weights so that these poststratified, adjusted respondent sampling weights are used in the SAS WEIGHT statement.

Note that your STATA output implies that the DESIGN of your survey has only one stratum and 4,337 primary sampling units.  STATA lists the number of postrata as 17.  However, since your SAS syntax uses the variable, POSTSTRATA, as the argument of its STRATA statement, the DESIGN of your survey in SAS implies 17 strata and 4,337 primary sampling units.  Thus, STATA "sees" only one stratum in your sample design, and SAS "sees" 17 strata in your sample design.

To make the STATA output conform with the SAS output, change the STATA SVYSET command to specify 17 strata and 4,337 primary sampling units:

    SVYSET, CLEAR

     SVYSET _n [pweight=aweight], strata(poststrata)  

Miracle
Barite | Level 11

Hi 1zmm. Thanks again for taking the time to read my post.


Yes, you are right. The result'll be the same for both SAS and STATA if specifying SVYSET _n [pweight=aweight], strata(poststrata).

I misunderstood the SAS's strata with Stata's poststrata command.

Actually I wish to perform poststratification adjustment for non-response in SAS.

I have the variable

- weight calculated for each stratum of  ethnicity(3) * deprivation(5), by dividing the expected deprivation distribution of each ethnic group by the observed deprivation distribution from our study.

- strata for each stratum of  ethnicity(3) * deprivation(5)


However  I can't find any example online on how to do it in SAS.

Perhaps you can guide me?

Thanks in advance.


1zmm
Quartz | Level 8

Check the following reference at Lex Jansen's Internet site:

  http://www.lexjansen.com/wuss/2012/162.pdf

This reference shows several methods on how to adjust the sampling weights from the observed responses in your survey to conform with the population totals you want to poststratify to.  Poststratification implies an external standard population that provides these population totals, but such a population is not necessary to adjust for survey nonresponse.  Then use these new poststratified weights in the SAS survey analysis procedures.


Miracle
Barite | Level 11

Thanks for link. I'll go and read it. Have a good day.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 2707 views
  • 0 likes
  • 2 in conversation