Hi,
I have the following code joining two tables by an unique identifier, in which a disease population is joined with a background population, creating approx 1000 matches per person in the disease population.
proc sql;
create table new as
select distinct case_id, id
from disease_pop inner join background_pop
on disease_pop.case_id = background_pop.id;
This makes a table that looks like this:
| CASE_ID | ID |
| 123 | 345 |
| 123 | 567 |
| 123 | 891 |
| 234 | 902 |
| 234 | 903 |
| 234 | 904 |
Only there are approximately 1000 "IDs" for each CASE ID
So what I want to do next is reduce the number of matches, so instead of having 1000 IDs per Case ID I would like only 10 IDs per CASE ID, but I still want to have all CASE IDs in my new table.
Any suggestions?
Hey so an update:
I ended up using PROC SURVEYSELECT and defining each patient as one stratum. Worked perfectly.
And what is the rule for reducing no of id's?
There are several ways of creating samples in SAS.
In a raw data step you could do BY case_id, start a counter, and do implicit OUTPUT as long the counter is <= 10.
Reset the counter for each new BY value (IF first.case_id THEN...).
Thanks for you reply. There is no rule as everyone in the background population is matched by the same variables. I will attempt your suggestion.
Hey so an update:
I ended up using PROC SURVEYSELECT and defining each patient as one stratum. Worked perfectly.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!