Hi All!
Looking for a quick and short solution (a replacement to a 2-3 steps logic I have), where I want to use a data step which will include merging two datasets (exactly same variables on both), and while merging also create a new variable called identifier which will specify whether a record coming from dataset1 or dataset2 on the output dataset.
data output;
merge a(in=ina) b(in=inb);
by <byvar list>;
if ina and inb then exists='inainb';
else if ina then exists ='ina';
else if inb then exists ='inb';
run;
@smantha Thanks for a quick reply.
I think I need a slightly different logic, as I do not need to check whether a record is in the first/second/both datasets.
I know I have exactly same records on both datasets, as second is a reduced dataset created from first dataset.
I simply need to create an indicator while merging the two datasets, so if the record coming from dataset1 it will mark it as A, but if a record coming from dataset2 it will mark it as B, and put it on a newly created variable.
Hope this clarifies better.
@Reeza combining two datasets
on the dataset2 I have exactly same records and on dataset1, the difference in between the two is that dataset1 has several times more records, and all of the records on dataset2 would be found on dataset1.
I do not need to check if any of the records are on both or only one dataset, but simply combine the two dataset and have a new variable created which would specify either record is coming from dataset1 or dataset2.
@_MVB_ wrote:
@Reeza combining two datasets
on the dataset2 I have exactly same records and on dataset1, the difference in between the two is that dataset1 has several times more records, and all of the records on dataset2 would be found on dataset1.
I do not need to check if any of the records are on both or only one dataset, but simply combine the two dataset and have a new variable created which would specify either record is coming from dataset1 or dataset2.
Did you run it and try it?
@Reeza it works, however I do not want to have a work.database1 or work.database2 in my output dataset. I can, of course put an extra line in that procedure where specify if work.database1 then A, else if work.database2 then B, but isn't there another way just to specify either A or B without IF THEN statement?
@_MVB_ wrote:
@Reeza it works, however I do not want to have a work.database1 or work.database2 in my output dataset. I can, of course put an extra line in that procedure where specify if work.database1 then A, else if work.database2 then B, but isn't there another way just to specify either A or B without IF THEN statement?
May be we get faster to your desired result using sample data.
With below sample data how should the desired result look like and how does it differ from what below code generates?
data a;
do i=1 to 10;
output;
end;
stop;
run;
data b;
set a;
if i in (1,5,7);
run;
data want;
length _inds inds $41;
set a b indsname=_inds;
inds=_inds;
run;
proc print data=want;
run;
In the end, I modified solution proposed by @Reeza just adding IF THEN statement to replace work.DATASET1 and work.DATASET2 fields with A or B.
Thanks for the feedback.
Please mark one of @Reeza's answers as solution so that this discussion gets closed.
@Reeza Thanks!
This is what I was looking for in terms of the short solution.
It would help to show the code you are currently using.
The dataset option IN= creates a variable indicating that a record came from that dataset
Pseudo example:
data garbage; merge set1 (in=in1) set2 (in=in2) ; if in1 and in2 then source='Both'; else if in1 then source = 'set1'; else if in2 then source ='set2'; run;
I have to assume your merge would have some by variable(s) but that wouldn't make a difference in the options and if/then/else code.
This is what I have now:
- dataset1 has 10,000 records
- dataset2 has 1,000 records
data dataset1;
set dataset1;
format ds $1.;
ds='A';
run;
data dataset2;
set dataset2;
format ds $1.
ds='B';
run;
data merged;
set dataset1 dataset2;
run;
Final 'merged' dataset has 11,000 records with one new variable 'ds' which has 'A' for the 10,000 records and 'B' for other 1,000 records.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.