I have a dataset that has cases and controls. I need to create a newsataset with matches at least a 1:4 ratio of cases and controls based on 2 variables. I am at a loss as to where to start
Obs 1 Type Rank Gender Score1 Score2
1 Control 1 2 10 5
2 case 2 1 1 1
3 control 1 1 1 3
.
.
.
.
So in the end I would like to see each case have 4 controls under neath with matching Rank and Gender
Obs 1 Type Rank Gender Score1 Score2
1 case 1 2 17 5
2 control 1 2 71 1
3 control 1 2 71 3
4 control 1 2 8 16
Any help would be appreciated! Maybe a macro?
I all depends,
The situation you start at and want to achieve is not clear enough. It looks like setting up a dataset to be used for data-mining.
Having too small number of observations you can boost them
Having many observations you can sample in a ratio
Having some requirements within series .... you can accomodate that.
What is your design/analyses?
My dataset is about 5000. I simplified the dataset in my explanation for simplicitity sake however its has cases and controls and their responses to survey questions. I would like to have 1:4 ratio cases to control matched on Military Rank (5 categories) and Gender . Design is a retrospective case-control matched analysis.
Google Mayo Macro Matching
I have never used a macro program so complicated. How do I use them in my programming? Do I just copy and fill out the appropriate variables?
Read the documentation in the code. Preferably read the code as well
In the docs is an example and at the bottom is an example with sample data and a call example.
I am using this one: http://www.mayo.edu/research/documents/gmatchsas/DOC-10027248 But I am getting no outputs or errors in log. I am new to Macros so appreciate the patient and help!
So no datasets get created and no errors in the log either? post the log instead of the code.
If you would like to learn how to program this yourself, instead of using a macro that you probably don't understand and that can do more than what you need, the steps are not so difficult.
1. Separate your observations into two data sets: treatment and control.
2. From the treatment data set, run a PROC FREQ on the combination of the key variables, and send the results to an output data set.
3. For the control data set, assign a random number to each observation.
4. For the control data set, sort by an extra variable: the key variables, plus the random number.
5. Merge the sorted control data set with the output data set from PROC FREQ (step 2). Use the COUNT variable from PROC FREQ to determine which control observations to keep and which to delete.
6. Combine the selected control observations with the treatment observations.
I know that's an overview, and you may need help pursuing this. Also note that this approach doesn't match up controls to specific treatment observations. If you have two treatment observations with the same key values, it will pick the right number of controls to match up to both treatment observations combined. You could randomly assign them to a particular treatment observation at that point, if needed. Note that there is no guarantee that your control data set contains enough matching observations for every treatment observation.
Finally, I haven't used PROC SURVEYSELECT very much. It's possible that it has the built-in capabilities to do this easily.
Good luck.
slivingston, As possible being new to SAS, there are al lot of studies presented. (cases and controls)
Google (/#q=Matching+cases+and+controls++site%3Asas.com&start=20) eliminate you own questions.
They should give some direction. I agree with Astoundings remarks for the work.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.