Solved: Select observations from dataset based on id numbers in another datase...

eabc0351 · Posted 09-01-2017 01:09 PM

Hello, I have a dataset with 20,000 unique identifiers and multiple observations per identifier. I need to select all the observations for 65 of those identifiers and output all the observations for those 65. The list of unique identifiers is in another dataset. Here is a sample of what I'm trying to accomplish:

data have:

dataset 1:

visit_id study_id var var var

1 1

2 1

3 1

4 2

5 2

6 3

7 3

8 4

9 5

10... 5...

dataset 2:

study_id

1

3

5

17

92

81

180

1299...

data want:

visit_id study_id var var var

1 1

2 1

3 1

6 3

7 3

9 5

10 5

810 17

811.. 17..

I know how to do this in PROC SQL with a WHERE study_id IN statement. But I don't want to have to list out the 65 id numbers. Any ideas? Thank you!

novinosrin · Posted 09-01-2017 01:40 PM

proc sort data=dataset1;

by study_id;

run;

proc sort data=dataset2;

by study_id;

run;

data want;

merge dataset1 dataset2(in=in2);

by study_id;

if in2;

run;

View solution in original post

novinosrin · Posted 09-01-2017 01:21 PM

It appears what you need is a simple merge?

sort the two datasets by study_id and merge by study_id

Use the IN= Dataset option to mark the dataset that contributes to observation in the PDV and apply the marker in a subsetting if statement to only keep the contributing obs from dataset2.

eabc0351 · Posted 09-01-2017 01:37 PM

Can you give me sample code?

The two datasets have all the same variables. Dataset 2 is a subset of dataset one. When I tried to merge, it didn't work.

novinosrin · Posted 09-01-2017 01:40 PM

proc sort data=dataset1;

by study_id;

run;

proc sort data=dataset2;

by study_id;

run;

data want;

merge dataset1 dataset2(in=in2);

by study_id;

if in2;

run;

eabc0351 · Posted 09-01-2017 02:00 PM

What does the in2 mean?

novinosrin · Posted 09-01-2017 02:05 PM

When you use a dataset option IN=, SAS basically creates a temp variable as a binary marker to keep a flag on the observations that dataset contributed to the observation construction at the time of execution. In2 is just a name that I gave and you are free to name anything you please. The temp variable is never written to the output dataset.

eabc0351 · Posted 09-01-2017 02:12 PM

Okay, simple enought, thanks so much!

ballardw · Posted 09-01-2017 01:42 PM

Here's one way:

proc sql;
   create table want as
   select b.*
   from dataset2 as a left join
        dataset1 as b
        on a.study_id=b.study_id
   ;
quit;

the a and b are aliases so you can type a.study_id instead dataset1.study_id. If you haven't used. SQL you have to reference the table as well as the variable when it exists in two or more of the tables involved so SAS uses the right one.

Left Join says take every value in the the set on the left (dataset2) and match the values with the other set based on the criteria following join. Note that if Study_id is duplicated in dataset2 then each value matches ALL of the same study_id in the dataset1 and you'll have lots more records than you expect.

eabc0351 · Posted 09-01-2017 02:13 PM

Thanks for the reply!

Select observations from dataset based on id numbers in another dataset

Re: Select observations from dataset based on id numbers in another dataset

Re: Select observations from dataset based on id numbers in another dataset

Re: Select observations from dataset based on id numbers in another dataset

Re: Select observations from dataset based on id numbers in another dataset

Re: Select observations from dataset based on id numbers in another dataset

Re: Select observations from dataset based on id numbers in another dataset

Re: Select observations from dataset based on id numbers in another dataset

Re: Select observations from dataset based on id numbers in another dataset

Re: Select observations from dataset based on id numbers in another dataset

Classroom Training Available!