BookmarkSubscribeRSS Feed
Lefty
Obsidian | Level 7

Hi,

I'm trying to merge three long (meaning the each ID has multiple rows of data) datasets but I know that about 150 of the IDs scattered throughout these datasets are people I want to exclude from my analysis. I have another dataset that contains only the IDs I want. How can I create a final long dataset that has all three of the long datasets, but that includes only the IDs that I want? I have tried to use in=a in the datastep but it didn't work. I tried something like this:

data merged;

set long1 long2 long3 idfile (in=a) ;

if a;

by id;

run;

Thanks!

Laurie

4 REPLIES 4
art297
Opal | Level 21

One way might be something like the following untested code:

data _long1;

  merge long1(in=a) idfile (in=b) ;

  if a and b;

  by id;

run;

data _long2;

  merge long2(in=a) idfile (in=b) ;

  if a and b;

  by id;

run;

data _long3;

  merge long3(in=a) idfile (in=b) ;

  if a and b;

  by id;

run;

data merged;

  set _long1 _long2 _long3;

  by id;

run;

Lefty
Obsidian | Level 7

Thanks so much for your thoughts. I tried it, and it didn't work, but it's because I wasn't very clear in my question. For the 3 long datasets, I don't want to merge them by the ID, I want them to go on top of one another (concatenate?) so each ID still has many rows of data. So it works to combine them by using the set command:

data mergedlong;

set long1 long2 long3;

run;

Then I used (essentially) your code to do the merge of combined long datasets plus the ID file which kicked out the IDs I didn't want:

proc sort data=mergedlong; by id;

data merged;

merge mergedlong (in=a) idfile (in=b);

if a and b;

by id; run;

Worked great! Thanks!!

art297
Opal | Level 21

Glad to hear you solved your problem.  However, for others facing a similar situation, both Tom's and my suggested code were designed to concatenate, not merge, the 3 files.  I really liked Tom's suggested method.

My guess, as to why it didn't initially work, was that the files weren't all first sorted by id.

However, the method you selected definitely accomplishes the task, too, and only required the one sort (I presume that the idfile was already sorted).

Tom
Super User Tom
Super User

Put the IDFILE first in the list and remember the setting of the IN= variable from the first observation for the current ID value.

data merged;

  set idfile(in=a) long1 long2 long3 ;

  by id;

  retain in_idfile;

  if first.id then in_idfile=a;

  if in_idfile;

run;


If you do not want to include extra records caused by the IDFILE then you might want to change your subsetting IF condition.

data merged;

  set idfile(in=a) long1 long2 long3 ;

  by id;

  retain in_idfile;

  if first.id then in_idfile=a;

  if in_idfile and NOT a ;

run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 2890 views
  • 3 likes
  • 3 in conversation