BookmarkSubscribeRSS Feed
Lefty
Obsidian | Level 7

Hi,

I'm trying to merge three long (meaning the each ID has multiple rows of data) datasets but I know that about 150 of the IDs scattered throughout these datasets are people I want to exclude from my analysis. I have another dataset that contains only the IDs I want. How can I create a final long dataset that has all three of the long datasets, but that includes only the IDs that I want? I have tried to use in=a in the datastep but it didn't work. I tried something like this:

data merged;

set long1 long2 long3 idfile (in=a) ;

if a;

by id;

run;

Thanks!

Laurie

4 REPLIES 4
art297
Opal | Level 21

One way might be something like the following untested code:

data _long1;

  merge long1(in=a) idfile (in=b) ;

  if a and b;

  by id;

run;

data _long2;

  merge long2(in=a) idfile (in=b) ;

  if a and b;

  by id;

run;

data _long3;

  merge long3(in=a) idfile (in=b) ;

  if a and b;

  by id;

run;

data merged;

  set _long1 _long2 _long3;

  by id;

run;

Lefty
Obsidian | Level 7

Thanks so much for your thoughts. I tried it, and it didn't work, but it's because I wasn't very clear in my question. For the 3 long datasets, I don't want to merge them by the ID, I want them to go on top of one another (concatenate?) so each ID still has many rows of data. So it works to combine them by using the set command:

data mergedlong;

set long1 long2 long3;

run;

Then I used (essentially) your code to do the merge of combined long datasets plus the ID file which kicked out the IDs I didn't want:

proc sort data=mergedlong; by id;

data merged;

merge mergedlong (in=a) idfile (in=b);

if a and b;

by id; run;

Worked great! Thanks!!

art297
Opal | Level 21

Glad to hear you solved your problem.  However, for others facing a similar situation, both Tom's and my suggested code were designed to concatenate, not merge, the 3 files.  I really liked Tom's suggested method.

My guess, as to why it didn't initially work, was that the files weren't all first sorted by id.

However, the method you selected definitely accomplishes the task, too, and only required the one sort (I presume that the idfile was already sorted).

Tom
Super User Tom
Super User

Put the IDFILE first in the list and remember the setting of the IN= variable from the first observation for the current ID value.

data merged;

  set idfile(in=a) long1 long2 long3 ;

  by id;

  retain in_idfile;

  if first.id then in_idfile=a;

  if in_idfile;

run;


If you do not want to include extra records caused by the IDFILE then you might want to change your subsetting IF condition.

data merged;

  set idfile(in=a) long1 long2 long3 ;

  by id;

  retain in_idfile;

  if first.id then in_idfile=a;

  if in_idfile and NOT a ;

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 2095 views
  • 3 likes
  • 3 in conversation