SAS Programming

_MVB_ · Posted 06-19-2020 04:00 PM

Hi All!

Looking for a quick and short solution (a replacement to a 2-3 steps logic I have), where I want to use a data step which will include merging two datasets (exactly same variables on both), and while merging also create a new variable called identifier which will specify whether a record coming from dataset1 or dataset2 on the output dataset.

Reeza · Posted 06-19-2020 04:19 PM

Are you merging or appending? Usually with the same variables its an append and then you can use INDSNAME option.

data want;
set have1 have2 indsname= source;
data_source = source;
run;

View solution in original post

smantha · Posted 06-19-2020 04:07 PM

data output;
merge a(in=ina) b(in=inb);
by <byvar list>;
if ina and inb then exists='inainb';
else if ina then exists ='ina';
else if inb then exists ='inb';
run;

_MVB_ · Posted 06-19-2020 04:14 PM

@smantha Thanks for a quick reply.

I think I need a slightly different logic, as I do not need to check whether a record is in the first/second/both datasets.

I know I have exactly same records on both datasets, as second is a reduced dataset created from first dataset.

I simply need to create an indicator while merging the two datasets, so if the record coming from dataset1 it will mark it as A, but if a record coming from dataset2 it will mark it as B, and put it on a newly created variable.

Hope this clarifies better.

Reeza · Posted 06-19-2020 04:19 PM

Are you merging or appending? Usually with the same variables its an append and then you can use INDSNAME option.

data want;
set have1 have2 indsname= source;
data_source = source;
run;

_MVB_ · Posted 06-19-2020 04:45 PM

@Reeza combining two datasets

on the dataset2 I have exactly same records and on dataset1, the difference in between the two is that dataset1 has several times more records, and all of the records on dataset2 would be found on dataset1.

I do not need to check if any of the records are on both or only one dataset, but simply combine the two dataset and have a new variable created which would specify either record is coming from dataset1 or dataset2.

Reeza · Posted 06-19-2020 05:09 PM

@_MVB_ wrote:

@Reeza combining two datasets

on the dataset2 I have exactly same records and on dataset1, the difference in between the two is that dataset1 has several times more records, and all of the records on dataset2 would be found on dataset1.

I do not need to check if any of the records are on both or only one dataset, but simply combine the two dataset and have a new variable created which would specify either record is coming from dataset1 or dataset2.

Did you run it and try it?

_MVB_ · Posted 06-19-2020 05:53 PM

@Reeza it works, however I do not want to have a work.database1 or work.database2 in my output dataset. I can, of course put an extra line in that procedure where specify if work.database1 then A, else if work.database2 then B, but isn't there another way just to specify either A or B without IF THEN statement?

Reeza · Posted 06-19-2020 05:59 PM

Then use the IN approach which gives you an indicator variable instead. If you absolutely, 100% require A, B then use IF/THEN. There are automated ways to do it, but if your requirements don't align then you need to use IF/THEN. You need to make sure it handles multiple data sets as well.

Patrick · Posted 06-19-2020 06:25 PM

@_MVB_ wrote:

@Reeza it works, however I do not want to have a work.database1 or work.database2 in my output dataset. I can, of course put an extra line in that procedure where specify if work.database1 then A, else if work.database2 then B, but isn't there another way just to specify either A or B without IF THEN statement?

May be we get faster to your desired result using sample data.

With below sample data how should the desired result look like and how does it differ from what below code generates?

data a;
  do i=1 to 10;
    output;
  end;
  stop;
run;
data b;
  set a;
  if i in (1,5,7);
run;

data want;
  length _inds inds $41;
  set a b indsname=_inds;
  inds=_inds;
run;

proc print data=want;
run;

_MVB_ · Posted 06-19-2020 08:18 PM

In the end, I modified solution proposed by @Reeza just adding IF THEN statement to replace work.DATASET1 and work.DATASET2 fields with A or B.

Patrick · Posted 06-19-2020 08:27 PM

Thanks for the feedback.

Please mark one of @Reeza's answers as solution so that this discussion gets closed.

Reeza · Posted 06-20-2020 12:21 PM

data want;
set have (in=a1) have2 (in=a2);
source = ifn(a1=1, 'A', 'B');
run;

_MVB_ · Posted 06-24-2020 12:52 PM

@Reeza Thanks!

This is what I was looking for in terms of the short solution.

ballardw · Posted 06-19-2020 04:21 PM

It would help to show the code you are currently using.

The dataset option IN= creates a variable indicating that a record came from that dataset

Pseudo example:

data garbage;
   merge set1 (in=in1)
              set2 (in=in2)
  ;
  if in1 and in2 then source='Both';
  else if in1 then source = 'set1';
  else if in2 then source ='set2';
run;

I have to assume your merge would have some by variable(s) but that wouldn't make a difference in the options and if/then/else code.

_MVB_ · Posted 06-19-2020 04:48 PM

This is what I have now:

- dataset1 has 10,000 records

- dataset2 has 1,000 records

data dataset1;

set dataset1;

format ds $1.;

ds='A';

run;

data dataset2;

set dataset2;

format ds $1.

ds='B';

run;

data merged;

set dataset1 dataset2;

run;

Final 'merged' dataset has 11,000 records with one new variable 'ds' which has 'A' for the 10,000 records and 'B' for other 1,000 records.

SAS Programming

Merge two datasets with creation of an identifier

Re: Merge two datasets with creation of an identifier

Re: Merge two datasets with creation of an identifier

Re: Merge two datasets with creation of an identifier

Re: Merge two datasets with creation of an identifier

Re: Merge two datasets with creation of an identifier

Re: Merge two datasets with creation of an identifier

Re: Merge two datasets with creation of an identifier

Re: Merge two datasets with creation of an identifier

Re: Merge two datasets with creation of an identifier

Re: Merge two datasets with creation of an identifier

Re: Merge two datasets with creation of an identifier

Re: Merge two datasets with creation of an identifier

Re: Merge two datasets with creation of an identifier

Re: Merge two datasets with creation of an identifier

Re: Merge two datasets with creation of an identifier

Dataset creation

Merging By Variable

set and merge

Help Merging Datasets

Merging 3 Datasets

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...