BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ha33
Obsidian | Level 7

Hi,


I have the following code joining two tables by an unique identifier, in which a disease population is joined with a background population, creating approx 1000 matches per person in the disease population.

 

proc sql;

create table new as

select distinct case_id, id

from disease_pop inner join background_pop

on disease_pop.case_id = background_pop.id;

 

This makes a table that looks like this:

 

CASE_IDID
123345
123567
123891
234902
234903
234904

 

Only there are approximately 1000 "IDs" for each CASE ID

 

So what I want to do next is reduce the number of matches, so instead of having 1000 IDs per Case ID I would like only 10 IDs per CASE ID, but I still want to have all CASE IDs in my new table.

 

Any suggestions?

 

1 ACCEPTED SOLUTION

Accepted Solutions
ha33
Obsidian | Level 7

Hey so an update:

 

I ended up using PROC SURVEYSELECT and defining each patient as one stratum. Worked perfectly. 

 

 

View solution in original post

3 REPLIES 3
LinusH
Tourmaline | Level 20

And what is the rule for reducing no of id's?

There are several ways of  creating samples in SAS.

In a raw data step you could do BY case_id, start a counter, and do implicit OUTPUT as long the counter is <= 10.

Reset the counter for each new BY value (IF first.case_id THEN...).

Data never sleeps
ha33
Obsidian | Level 7

Thanks for you reply. There is no rule as everyone in the background population is matched by the same variables. I will attempt your suggestion. 

ha33
Obsidian | Level 7

Hey so an update:

 

I ended up using PROC SURVEYSELECT and defining each patient as one stratum. Worked perfectly.