DATA Step, Macro, Functions and more

selecting controls

Accepted Solution Solved
Reply
Contributor
Posts: 35
Accepted Solution

selecting controls

Hi,

 

Using survey data, I am trying to compare patients who have a disease to a random number of equal controls from the same dataset who do not have the disease. I am assuming that I can select controls at random from the population, as I will use a multivariate analysis later on to asses for my outcomes, and I will control for age, gender, etc. at that time. 

 

I need to know what is the syntax for finding random observations from SAS survey data; furthermore, please clarify if I am ok in selecting random controls, or should I match by age and gender during selection of controls (if so, I will need the syntax for matching). 

 

Thank you


Accepted Solutions
Solution
‎11-19-2017 04:46 AM
Super User
Posts: 24,010

Re: selecting controls

Posted in reply to sasnewbie12

sasnewbie12 wrote:

Hi,

 

I need to know what is the syntax for finding random observations from SAS survey data; furthermore, please clarify if I am ok in selecting random controls, or should I match by age and gender during selection of controls (if so, I will need the syntax for matching). 

 

Thank you


 

 

What's the benefit of matching cases with multiple controls? Is there an added benefit in your case for matching via specific variables? Are you sure of your assumption that you can choose at random? How would you test that? And note that's typically a step in a report where case/control is used. At least in journal publications for sure. 

 

PROC SURVEYSELECT is used for selecting random samples. 

 

There is no defined procedure or methodology from a statistical perspective regarding how to select case/control matches. I would suggest looking up the Mayo Clinic macros and propensity score matching. 

 

I strongly recommend you determine your statistical analysis plan first and then decide how you can use SAS to implement it. 

 

 

View solution in original post


All Replies
Solution
‎11-19-2017 04:46 AM
Super User
Posts: 24,010

Re: selecting controls

Posted in reply to sasnewbie12

sasnewbie12 wrote:

Hi,

 

I need to know what is the syntax for finding random observations from SAS survey data; furthermore, please clarify if I am ok in selecting random controls, or should I match by age and gender during selection of controls (if so, I will need the syntax for matching). 

 

Thank you


 

 

What's the benefit of matching cases with multiple controls? Is there an added benefit in your case for matching via specific variables? Are you sure of your assumption that you can choose at random? How would you test that? And note that's typically a step in a report where case/control is used. At least in journal publications for sure. 

 

PROC SURVEYSELECT is used for selecting random samples. 

 

There is no defined procedure or methodology from a statistical perspective regarding how to select case/control matches. I would suggest looking up the Mayo Clinic macros and propensity score matching. 

 

I strongly recommend you determine your statistical analysis plan first and then decide how you can use SAS to implement it. 

 

 

Super User
Posts: 24,010

Re: selecting controls

So there is a procedure for match, as of SAS STAT 14.2 I think.

 

It's PROC PSMATCH.

 

http://documentation.sas.com/?docsetId=statug&docsetTarget=statug_psmatch_gettingstarted.htm&docsetV...

Contributor
Posts: 30

Re: selecting controls

Posted in reply to sasnewbie12

I agree with @Reeza. You should determine how you will match, what matching algorithm you will use, then implement that via SAS. One place to start is the Greedy 5->1 matching algorithm, but you will need to compute propensity scores first. 

 

See for example:

http://www2.sas.com/proceedings/sugi26/p214-26.pdf

 

Contributor
Posts: 35

Re: selecting controls

Posted in reply to sasnewbie12

Thanks for the responses

 

 

I think I should explain more clearly what I want to do, I may have not been clear before.

 

I am replicating a study that was previously done with a different and more comprehensive dataset. This will be a cross-sectional study. 

We will look for cases of complication X that occurred during clinical operations A, B and C from a large data set of millions of cases. Then we will have to compare these cases of complication X with controls who underwent operations A, B and C but did not have complication X (from the same dataset). We will assess to which types of patients are more likely to have complication X during operations A, B and C. 

 

I can identify the cases. However, I am not sure how to go back and randomly select controls from the same population. 

Is Proc Surveyselect what I need? 

 

 

Thanks

Super User
Posts: 24,010

Re: selecting controls

Posted in reply to sasnewbie12

I can identify the cases. However, I am not sure how to go back and randomly select controls from the same population. Is Proc Surveyselect what I need?

 

 

PROC SURVEYSELECT does random selection. It's very, very unlikely a random selection was used as the case control methodology. 

 

And the matching process isn't a straightforward procedure, such as saying use PROC GIVE_ME_ANSWER. There are many ways that can be implemented and each method will likely change your results. 

Super Contributor
Posts: 332

Re: selecting controls

Posted in reply to sasnewbie12

I have not used Proc SurveySelect. This is my small advice.

 

For every case there can be 100s of Controls. Before decide select a Control, decide on the characteristics of patients which have direct

impact on outcome variable( Complication X). Usually, Age, Gender, any residential characteristics as an indirect indicator of their economic status, plus any other potential variables that may be associated with X. Exact match on these with Case and Control may not be feasible. So some ranges in variables have to be accepted.  You match on Gender, but Age will not. For example, Age of Case plus or minus 1 may be considered for selecting the Control. This relaxing conditions be fixed at this stage. The important note is that selecting more than 3 Controls per Case will not be profitable for Case-Control studies, less is better. Let the number of Controls be 2.

 

Select a case with Complication X.

     Note the Age, Gender, Other Variables you have decided.

Find those without Complication X.

    Match with Age, Gender, Other Vars. Count the number of them. 

   If 1 control is found, choose the patient.

   If you have more than say, 10, then choose 2 of them at random. You may use FLOYD Algorithm for this.

   If no Control is found for a selected Case, relax the matching criteria as decided in advance before the selection process.

    

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 233 views
  • 8 likes
  • 4 in conversation