Solved: Re: Selecting fixed number of matches in an inner join

ha33 · Posted 02-07-2023 04:16 AM

Hi,

I have the following code joining two tables by an unique identifier, in which a disease population is joined with a background population, creating approx 1000 matches per person in the disease population.

proc sql;

create table new as

select distinct case_id, id

from disease_pop inner join background_pop

on disease_pop.case_id = background_pop.id;

This makes a table that looks like this:

CASE_ID	ID
123	345
123	567
123	891
234	902
234	903
234	904

Only there are approximately 1000 "IDs" for each CASE ID

So what I want to do next is reduce the number of matches, so instead of having 1000 IDs per Case ID I would like only 10 IDs per CASE ID, but I still want to have all CASE IDs in my new table.

Any suggestions?

ha33 · Posted 02-13-2023 03:26 AM

Hey so an update:

I ended up using PROC SURVEYSELECT and defining each patient as one stratum. Worked perfectly.

View solution in original post

LinusH · Posted 02-07-2023 04:40 AM

And what is the rule for reducing no of id's?

There are several ways of creating samples in SAS.

In a raw data step you could do BY case_id, start a counter, and do implicit OUTPUT as long the counter is <= 10.

Reset the counter for each new BY value (IF first.case_id THEN...).

Data never sleeps

ha33 · Posted 02-07-2023 04:44 AM

Thanks for you reply. There is no rule as everyone in the background population is matched by the same variables. I will attempt your suggestion.

ha33 · Posted 02-13-2023 03:26 AM

Hey so an update:

I ended up using PROC SURVEYSELECT and defining each patient as one stratum. Worked perfectly.

Selecting fixed number of matches in an inner join

Re: Selecting fixed number of matches in an inner join

Re: Selecting fixed number of matches in an inner join

Re: Selecting fixed number of matches in an inner join

Re: Selecting fixed number of matches in an inner join

SAS Innovate 2025: Register Now