BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sasnewbie12
Obsidian | Level 7

Hi,

 

Using survey data, I am trying to compare patients who have a disease to a random number of equal controls from the same dataset who do not have the disease. I am assuming that I can select controls at random from the population, as I will use a multivariate analysis later on to asses for my outcomes, and I will control for age, gender, etc. at that time. 

 

I need to know what is the syntax for finding random observations from SAS survey data; furthermore, please clarify if I am ok in selecting random controls, or should I match by age and gender during selection of controls (if so, I will need the syntax for matching). 

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

@sasnewbie12 wrote:

Hi,

 

I need to know what is the syntax for finding random observations from SAS survey data; furthermore, please clarify if I am ok in selecting random controls, or should I match by age and gender during selection of controls (if so, I will need the syntax for matching). 

 

Thank you


 

 

What's the benefit of matching cases with multiple controls? Is there an added benefit in your case for matching via specific variables? Are you sure of your assumption that you can choose at random? How would you test that? And note that's typically a step in a report where case/control is used. At least in journal publications for sure. 

 

PROC SURVEYSELECT is used for selecting random samples. 

 

There is no defined procedure or methodology from a statistical perspective regarding how to select case/control matches. I would suggest looking up the Mayo Clinic macros and propensity score matching. 

 

I strongly recommend you determine your statistical analysis plan first and then decide how you can use SAS to implement it. 

 

 

View solution in original post

6 REPLIES 6
Reeza
Super User

@sasnewbie12 wrote:

Hi,

 

I need to know what is the syntax for finding random observations from SAS survey data; furthermore, please clarify if I am ok in selecting random controls, or should I match by age and gender during selection of controls (if so, I will need the syntax for matching). 

 

Thank you


 

 

What's the benefit of matching cases with multiple controls? Is there an added benefit in your case for matching via specific variables? Are you sure of your assumption that you can choose at random? How would you test that? And note that's typically a step in a report where case/control is used. At least in journal publications for sure. 

 

PROC SURVEYSELECT is used for selecting random samples. 

 

There is no defined procedure or methodology from a statistical perspective regarding how to select case/control matches. I would suggest looking up the Mayo Clinic macros and propensity score matching. 

 

I strongly recommend you determine your statistical analysis plan first and then decide how you can use SAS to implement it. 

 

 

Reeza
Super User

So there is a procedure for match, as of SAS STAT 14.2 I think.

 

It's PROC PSMATCH.

 

http://documentation.sas.com/?docsetId=statug&docsetTarget=statug_psmatch_gettingstarted.htm&docsetV...

bstarr
Quartz | Level 8

I agree with @Reeza. You should determine how you will match, what matching algorithm you will use, then implement that via SAS. One place to start is the Greedy 5->1 matching algorithm, but you will need to compute propensity scores first. 

 

See for example:

http://www2.sas.com/proceedings/sugi26/p214-26.pdf

 

sasnewbie12
Obsidian | Level 7

Thanks for the responses

 

 

I think I should explain more clearly what I want to do, I may have not been clear before.

 

I am replicating a study that was previously done with a different and more comprehensive dataset. This will be a cross-sectional study. 

We will look for cases of complication X that occurred during clinical operations A, B and C from a large data set of millions of cases. Then we will have to compare these cases of complication X with controls who underwent operations A, B and C but did not have complication X (from the same dataset). We will assess to which types of patients are more likely to have complication X during operations A, B and C. 

 

I can identify the cases. However, I am not sure how to go back and randomly select controls from the same population. 

Is Proc Surveyselect what I need? 

 

 

Thanks

Reeza
Super User

I can identify the cases. However, I am not sure how to go back and randomly select controls from the same population. Is Proc Surveyselect what I need?

 

 

PROC SURVEYSELECT does random selection. It's very, very unlikely a random selection was used as the case control methodology. 

 

And the matching process isn't a straightforward procedure, such as saying use PROC GIVE_ME_ANSWER. There are many ways that can be implemented and each method will likely change your results. 

KachiM
Rhodochrosite | Level 12

I have not used Proc SurveySelect. This is my small advice.

 

For every case there can be 100s of Controls. Before decide select a Control, decide on the characteristics of patients which have direct

impact on outcome variable( Complication X). Usually, Age, Gender, any residential characteristics as an indirect indicator of their economic status, plus any other potential variables that may be associated with X. Exact match on these with Case and Control may not be feasible. So some ranges in variables have to be accepted.  You match on Gender, but Age will not. For example, Age of Case plus or minus 1 may be considered for selecting the Control. This relaxing conditions be fixed at this stage. The important note is that selecting more than 3 Controls per Case will not be profitable for Case-Control studies, less is better. Let the number of Controls be 2.

 

Select a case with Complication X.

     Note the Age, Gender, Other Variables you have decided.

Find those without Complication X.

    Match with Age, Gender, Other Vars. Count the number of them. 

   If 1 control is found, choose the patient.

   If you have more than say, 10, then choose 2 of them at random. You may use FLOYD Algorithm for this.

   If no Control is found for a selected Case, relax the matching criteria as decided in advance before the selection process.

    

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1229 views
  • 8 likes
  • 4 in conversation