BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
eabc0351
Quartz | Level 8

Hello, I have a dataset with 20,000 unique identifiers and multiple observations per identifier. I need to select all the observations for 65 of those identifiers and output all the observations for those 65. The list of unique identifiers is in another dataset. Here is a sample of what I'm trying to accomplish:

 

data have:

dataset 1:                                 

visit_id      study_id       var     var    var

1                   1              

2                   1              

3                   1              

4                   2        

5                   2

6                   3

7                   3

8                   4

9                   5

10...              5...

 

dataset 2:

study_id

1

3

5

17

92

81

180

1299...      

 

 

data want:

visit_id      study_id       var     var    var

1                   1              

2                   1              

3                   1              

6                   3

7                   3

9                   5

10                 5

810              17

811..            17..

 

 

I know how to do this in PROC SQL with a WHERE study_id IN statement. But I don't want to have to list out the 65 id numbers. Any ideas? Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

proc sort data=dataset1;

by study_id;

run;

 

proc sort data=dataset2;

by study_id;

run;

 

data want;

merge dataset1 dataset2(in=in2);

by study_id;

if in2;

run;

 

View solution in original post

8 REPLIES 8
novinosrin
Tourmaline | Level 20

It appears what you need is a simple merge?

sort the two datasets by study_id and merge by study_id

 

Use the IN= Dataset option to mark the dataset that contributes to observation in the PDV and apply the marker in a subsetting if statement to only keep the contributing obs from dataset2.

eabc0351
Quartz | Level 8

Can you give me sample code?

The two datasets have all the same variables. Dataset 2 is a subset of dataset one. When I tried to merge, it didn't work.

 

novinosrin
Tourmaline | Level 20

proc sort data=dataset1;

by study_id;

run;

 

proc sort data=dataset2;

by study_id;

run;

 

data want;

merge dataset1 dataset2(in=in2);

by study_id;

if in2;

run;

 

novinosrin
Tourmaline | Level 20

When you use a dataset option IN=, SAS basically creates a temp variable as a binary marker to keep a flag on the observations that dataset contributed to the observation construction at the time of execution. In2 is just a name that I gave and you are free to name anything you please. The temp variable is never written to the output dataset.

eabc0351
Quartz | Level 8

Okay, simple enought, thanks so much!

 

ballardw
Super User

Here's one way:

 

proc sql;
   create table want as
   select b.*
   from dataset2 as a left join
        dataset1 as b
        on a.study_id=b.study_id
   ;
quit;

the a and b are aliases so you can type a.study_id instead dataset1.study_id. If you haven't used. SQL you have to reference the table as well as the variable when it exists in two or more of the tables involved so SAS uses the right one.

 

Left Join says take every value in the the set on the left (dataset2) and match the values with the other set based on the criteria following join. Note that if Study_id is duplicated in dataset2 then each value matches ALL of the same study_id in the dataset1 and you'll have lots more records than you expect.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 13848 views
  • 0 likes
  • 3 in conversation