DATA Step, Macro, Functions and more

Select observations from dataset based on id numbers in another dataset

Accepted Solution Solved
Reply
Contributor
Posts: 32
Accepted Solution

Select observations from dataset based on id numbers in another dataset

Hello, I have a dataset with 20,000 unique identifiers and multiple observations per identifier. I need to select all the observations for 65 of those identifiers and output all the observations for those 65. The list of unique identifiers is in another dataset. Here is a sample of what I'm trying to accomplish:

 

data have:

dataset 1:                                 

visit_id      study_id       var     var    var

1                   1              

2                   1              

3                   1              

4                   2        

5                   2

6                   3

7                   3

8                   4

9                   5

10...              5...

 

dataset 2:

study_id

1

3

5

17

92

81

180

1299...      

 

 

data want:

visit_id      study_id       var     var    var

1                   1              

2                   1              

3                   1              

6                   3

7                   3

9                   5

10                 5

810              17

811..            17..

 

 

I know how to do this in PROC SQL with a WHERE study_id IN statement. But I don't want to have to list out the 65 id numbers. Any ideas? Thank you!


Accepted Solutions
Solution
‎09-01-2017 02:13 PM
PROC Star
Posts: 283

Re: Select observations from dataset based on id numbers in another dataset

proc sort data=dataset1;

by study_id;

run;

 

proc sort data=dataset2;

by study_id;

run;

 

data want;

merge dataset1 dataset2(in=in2);

by study_id;

if in2;

run;

 

View solution in original post


All Replies
PROC Star
Posts: 283

Re: Select observations from dataset based on id numbers in another dataset

[ Edited ]

It appears what you need is a simple merge?

sort the two datasets by study_id and merge by study_id

 

Use the IN= Dataset option to mark the dataset that contributes to observation in the PDV and apply the marker in a subsetting if statement to only keep the contributing obs from dataset2.

Contributor
Posts: 32

Re: Select observations from dataset based on id numbers in another dataset

Posted in reply to novinosrin

Can you give me sample code?

The two datasets have all the same variables. Dataset 2 is a subset of dataset one. When I tried to merge, it didn't work.

 

Solution
‎09-01-2017 02:13 PM
PROC Star
Posts: 283

Re: Select observations from dataset based on id numbers in another dataset

proc sort data=dataset1;

by study_id;

run;

 

proc sort data=dataset2;

by study_id;

run;

 

data want;

merge dataset1 dataset2(in=in2);

by study_id;

if in2;

run;

 

Contributor
Posts: 32

Re: Select observations from dataset based on id numbers in another dataset

Posted in reply to novinosrin

What does the in2 mean?

PROC Star
Posts: 283

Re: Select observations from dataset based on id numbers in another dataset

When you use a dataset option IN=, SAS basically creates a temp variable as a binary marker to keep a flag on the observations that dataset contributed to the observation construction at the time of execution. In2 is just a name that I gave and you are free to name anything you please. The temp variable is never written to the output dataset.

Contributor
Posts: 32

Re: Select observations from dataset based on id numbers in another dataset

Posted in reply to novinosrin

Okay, simple enought, thanks so much!

 

Super User
Posts: 11,343

Re: Select observations from dataset based on id numbers in another dataset

Here's one way:

 

proc sql;
   create table want as
   select b.*
   from dataset2 as a left join
        dataset1 as b
        on a.study_id=b.study_id
   ;
quit;

the a and b are aliases so you can type a.study_id instead dataset1.study_id. If you haven't used. SQL you have to reference the table as well as the variable when it exists in two or more of the tables involved so SAS uses the right one.

 

Left Join says take every value in the the set on the left (dataset2) and match the values with the other set based on the criteria following join. Note that if Study_id is duplicated in dataset2 then each value matches ALL of the same study_id in the dataset1 and you'll have lots more records than you expect.

Contributor
Posts: 32

Re: Select observations from dataset based on id numbers in another dataset

Thanks for the reply!

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 178 views
  • 0 likes
  • 3 in conversation