BookmarkSubscribeRSS Feed
mt_santos
Calcite | Level 5

Hi ,  does anyone know how to run a 2-stage least squares regression using a survey dataset with a complex survey design? 

Thanks,

Maria

4 REPLIES 4
ets_kps
SAS Employee

Hi Maria,

Can you tell us a bit more about your "complex survey design"?  We have a portfolio of "SURVEY*" procedures but none address 2sls specifically though I am sure they could be cobbled together. 

Ken

mt_santos
Calcite | Level 5

Hi Ken- Here is a description  of the sample design.  If you need more information, let me know.

The sample design incorporated a two stage sampling strategy; the first stage involved the selection of communities to participate in the survey. First Nations communities were stratified by region, sub-region, and community size [large (1500+ people), medium (300-1499 people), and small (<300 people)]. Large communities were automatically included, while medium and small communities were randomly selected with equal probability within their respective strata.

The second stage of sampling pertained to the selection of individuals within each community sampled. Community members were identified using band membership lists. Data were gathered to represent eight categories of the community population (gender by four age-groups). The sampling rate within each community was determined as a function of the overall sub-region probability (within regions) and the probability of selection of the community (within sub-region).

Individual responses were weighted, using INAC registry counts, to reflect, with greater accuracy, the representation of the population by the sample.


So typically we would use PROC SURVEYMEANS/SURVEYFREQ OR SURVEYREG when analyzing complex sample data.  For- 2sls -is it possible to run 2 separate regressions? And how would that work?  I guess that is what you mean by cobbling together the "SURVEY" procedures.


Thank you,

Maria

ballardw
Super User

This description is two-stage SAMPLING and it looks like the community size variable would be a STRATA variable .

If I understand correctly it looks like your code should include a strata statement something like:

STRATA Region CommunitySize;

and should be good to go.

There may be some syntax issues if you want to do Domain analysis on a strata variable. If so a second variable identical to the strata may be needed for the DOMAIN statement.

mt_santos
Calcite | Level 5

Hi ballardw,   Thank you. Yes, this is a two-stage sampling design. However,  we are looking at a model where the dependent variable and independent variable is possibly bidirectional. So we would like to use an instrumental variable in the model which does not correlate with the dependent variable but correlates with the independent variable.  And thus the reason for performing a 2-stage least squares regression. So I'm wondering whether there is any code available to run a 2-SLS regression with a data set that comes from a two-stage sampling design.

Maria

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1677 views
  • 0 likes
  • 3 in conversation