- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone.
Hope you all had a great Christmas and New Year.
I wish to translate the below Stata coding to SAS and I'm wondering if I have the SAS coding right because the result from Stata differs from SAS's.
Stata:
svyset _n, poststrata(poststrata) postweight(aweight)
svy, subpop(european): logistic case i.age_cat i.interviewmethod i.status i.deprivation
SAS:
proc surveylogistic data=bc;
strata poststrata;
weight aweight;
domain european;
class age_cat(ref="0") interviewmethod(ref="3") deprivation(ref="0") / param=ref;
model case(event="1") = age_cat interviewmethod status deprivation;
run;
Your help is greatly appreciated. Have a good day.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
By default, STATA's method of creating indicator ("dummy") variables for a categorical variable is to create such variables for all levels of the categorical variable and to omit the indicator variable corresponding to the smallest level of the categorical variable. One of your categorical variables in the SAS code, INTERVIEWMETHOD, has a reference level [=3] that is probably NOT the smallest level for this variable [probably something less than 3]. Therefore, to translate the STATA coding to SAS, specify the smallest level for INTERVIEWMETHOD as its reference level in the CLASS statement.
Or, you can study the STATA documentation to change the default behavior for STATA in creating indicator variables to specify instead level 3 of INTERVIEWMETHOD as its reference level.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi 1zmm, thanks for your reply.
I'm interested to know the effect of deprivation while adjusting for the rest of variables so by changing the reference group of INTERVIEWMETHOD is not necessary.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your original question asked why the results from SAS were not the same as those from STATA. Since you did not show the results from either program, all I could do was suggest one possible reason why these results may differ. Although changing the reference group for INTERVIEWMETHOD should not change the overall effect of deprivation after adjustment, if you had changed this reference group, did the results from SAS still differ from those of STATA?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The SAS output for
Domain Summary
Number of Observations 4342
Number of Observations in Domain 264
Number of Observations not in Domain 4078
Sum of Weights in Domain 267.82300
Variance Estimation
Method Taylor Series
Variance Adjustment Degrees of Freedom (DF)
Number of Observations Read 4342
Number of Observations Used 4331
Sum of Weights Read 267.823
Sum of Weights Used 261.643
I thought the "Sum of Weights in Domain" should be equal to "Number of Observations in Domain".
and the "Sum of Weights Used" should be equal to the number of observations with complete data in the surveylogistic regression.
Can anyone please shed some light on this? Perhaps I did it wrong?
Super thanks in advance!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Were the weights integral weights equal to 1.00? If not, why would you expected a weighted sum equal to an integral number of observations particularly in a DOMAIN analysis where you are studying a subgroup of the entire sample?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi 1zmm. I really appreciate your reply.
Here is a fraction of the Stata output.
. svy, subpop(european): logistic case i.age_cat i.interviewmethod i.status i.deprivation
(running logistic on estimation sample)
Survey: Logistic regression
Number of strata = 1 Number of obs = 4337
Number of PSUs = 4337 Population size = 14.996
N. of poststrata = 17 Subpop. no. of obs = 259
Subpop. size = 3.9069105
Design df = 4336
F( 10, 4327) = 1.92
Prob > F = 0.0382
I have a total of 4342 individuals and 11 with incomplete data for the logistic regression. Total European is 264 and 5 with incomplete data and hence Subpop. no. of obs=259.
I don't understand why the discrepancy in result between SAS and Stata. I use the same data in both Stata and SAS. What have I done wrong?
Your help is greatly appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think the discrepancy is due to your use in the STATA SVYSET command of the options, POSTSTRATA and POSTWEIGHT. These options are used to adjust the respondent sampling weights so that they sum to the population sizes within each poststratum to account for nonresponse and underrepresented groups in the population (cf., the STATA documentation). These poststratification strata differ from the "design" strata used in your complex sample survey and in the SAS STRATA statement. In SAS, these poststratification adjustments are usually performed beforehand on the respondent sampling weights so that these poststratified, adjusted respondent sampling weights are used in the SAS WEIGHT statement.
Note that your STATA output implies that the DESIGN of your survey has only one stratum and 4,337 primary sampling units. STATA lists the number of postrata as 17. However, since your SAS syntax uses the variable, POSTSTRATA, as the argument of its STRATA statement, the DESIGN of your survey in SAS implies 17 strata and 4,337 primary sampling units. Thus, STATA "sees" only one stratum in your sample design, and SAS "sees" 17 strata in your sample design.
To make the STATA output conform with the SAS output, change the STATA SVYSET command to specify 17 strata and 4,337 primary sampling units:
SVYSET, CLEAR
SVYSET _n [pweight=aweight], strata(poststrata)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi 1zmm. Thanks again for taking the time to read my post.
Yes, you are right. The result'll be the same for both SAS and STATA if specifying SVYSET _n [pweight=aweight], strata(poststrata).
I misunderstood the SAS's strata with Stata's poststrata command.
Actually I wish to perform poststratification adjustment for non-response in SAS.
I have the variable
- weight calculated for each stratum of ethnicity(3) * deprivation(5), by dividing the expected deprivation distribution of each ethnic group by the observed deprivation distribution from our study.
- strata for each stratum of ethnicity(3) * deprivation(5)
However I can't find any example online on how to do it in SAS.
Perhaps you can guide me?
Thanks in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Check the following reference at Lex Jansen's Internet site:
http://www.lexjansen.com/wuss/2012/162.pdf
This reference shows several methods on how to adjust the sampling weights from the observed responses in your survey to conform with the population totals you want to poststratify to. Poststratification implies an external standard population that provides these population totals, but such a population is not necessary to adjust for survey nonresponse. Then use these new poststratified weights in the SAS survey analysis procedures.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for link. I'll go and read it. Have a good day.