BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
_MVB_
Obsidian | Level 7

Hi All!

Looking for a quick and short solution (a replacement to a 2-3 steps logic I have), where I want to use a data step which will include merging two datasets (exactly same variables on both), and while merging also create a new variable called identifier which will specify whether a record coming from dataset1 or dataset2 on the output dataset.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
Are you merging or appending? Usually with the same variables its an append and then you can use INDSNAME option.

data want;
set have1 have2 indsname= source;
data_source = source;
run;

View solution in original post

17 REPLIES 17
smantha
Lapis Lazuli | Level 10
data output;
merge a(in=ina) b(in=inb);
by <byvar list>;
if ina and inb then exists='inainb';
else if ina then exists ='ina';
else if inb then exists ='inb';
run;
_MVB_
Obsidian | Level 7

@smantha Thanks for a quick reply.

 

I think I need a slightly different logic, as I do not need to check whether a record is in the first/second/both datasets.

I know I have exactly same records on both datasets, as second is a reduced dataset created from first dataset.

 

I simply need to create an indicator while merging the two datasets, so if the record coming from dataset1 it will mark it as A, but if a record coming from dataset2 it will mark it as B, and put it on a newly created variable.

 

Hope this clarifies better.

Reeza
Super User
Are you merging or appending? Usually with the same variables its an append and then you can use INDSNAME option.

data want;
set have1 have2 indsname= source;
data_source = source;
run;
_MVB_
Obsidian | Level 7

@Reeza combining two datasets

on the dataset2 I have exactly same records and on dataset1, the difference in between the two is that dataset1 has several times more records, and all of the records on dataset2 would be found on dataset1.

I do not need to check if any of the records are on both or only one dataset, but simply combine the two dataset and have a new variable created which would specify either record is coming from dataset1 or dataset2.

Reeza
Super User

@_MVB_ wrote:

@Reeza combining two datasets

on the dataset2 I have exactly same records and on dataset1, the difference in between the two is that dataset1 has several times more records, and all of the records on dataset2 would be found on dataset1.

I do not need to check if any of the records are on both or only one dataset, but simply combine the two dataset and have a new variable created which would specify either record is coming from dataset1 or dataset2.


Did you run it and try it?

_MVB_
Obsidian | Level 7

@Reeza it works, however I do not want to have a work.database1 or work.database2 in my output dataset. I can, of course put an extra line in that procedure where specify if work.database1 then A, else if work.database2 then B, but isn't there another way just to specify either A or B without IF THEN statement?

Reeza
Super User
Then use the IN approach which gives you an indicator variable instead. If you absolutely, 100% require A, B then use IF/THEN. There are automated ways to do it, but if your requirements don't align then you need to use IF/THEN. You need to make sure it handles multiple data sets as well.
Patrick
Opal | Level 21

@_MVB_ wrote:

@Reeza it works, however I do not want to have a work.database1 or work.database2 in my output dataset. I can, of course put an extra line in that procedure where specify if work.database1 then A, else if work.database2 then B, but isn't there another way just to specify either A or B without IF THEN statement?


May be we get faster to your desired result using sample data.

With below sample data how should the desired result look like and how does it differ from what below code generates?

data a;
  do i=1 to 10;
    output;
  end;
  stop;
run;
data b;
  set a;
  if i in (1,5,7);
run;

data want;
  length _inds inds $41;
  set a b indsname=_inds;
  inds=_inds;
run;

proc print data=want;
run;

Patrick_0-1592605395846.png

 

 

_MVB_
Obsidian | Level 7

In the end, I modified solution proposed by @Reeza just adding IF THEN statement to replace work.DATASET1 and work.DATASET2 fields with A or B.

Patrick
Opal | Level 21

Thanks for the feedback.

Please mark one of @Reeza's answers as solution so that this discussion gets closed. 

Reeza
Super User
data want;
set have (in=a1) have2 (in=a2);
source = ifn(a1=1, 'A', 'B');
run;
_MVB_
Obsidian | Level 7

@Reeza Thanks!

This is what I was looking for in terms of the short solution.

ballardw
Super User

It would help to show the code you are currently using.

 

The dataset option IN= creates a variable indicating that a record came from that dataset

 

Pseudo example:

data garbage;
   merge set1 (in=in1)
              set2 (in=in2)
  ;
  if in1 and in2 then source='Both';
  else if in1 then source = 'set1';
  else if in2 then source ='set2';
run;

I have to assume your merge would have some by variable(s) but that wouldn't make a difference in the options and if/then/else code.

_MVB_
Obsidian | Level 7

This is what I have now:

 - dataset1 has 10,000 records

 - dataset2 has 1,000 records

 

data dataset1;

set dataset1;

format ds $1.;

ds='A';

run;

 

data dataset2;

set dataset2;

format ds $1.

ds='B';

run;

 

data merged;

set dataset1 dataset2;

run;

 

Final 'merged' dataset has 11,000 records with one new variable 'ds' which has 'A' for the 10,000 records and 'B' for other 1,000 records.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 17 replies
  • 4927 views
  • 6 likes
  • 5 in conversation