- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi All!
Looking for a quick and short solution (a replacement to a 2-3 steps logic I have), where I want to use a data step which will include merging two datasets (exactly same variables on both), and while merging also create a new variable called identifier which will specify whether a record coming from dataset1 or dataset2 on the output dataset.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data want;
set have1 have2 indsname= source;
data_source = source;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data output;
merge a(in=ina) b(in=inb);
by <byvar list>;
if ina and inb then exists='inainb';
else if ina then exists ='ina';
else if inb then exists ='inb';
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@smantha Thanks for a quick reply.
I think I need a slightly different logic, as I do not need to check whether a record is in the first/second/both datasets.
I know I have exactly same records on both datasets, as second is a reduced dataset created from first dataset.
I simply need to create an indicator while merging the two datasets, so if the record coming from dataset1 it will mark it as A, but if a record coming from dataset2 it will mark it as B, and put it on a newly created variable.
Hope this clarifies better.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data want;
set have1 have2 indsname= source;
data_source = source;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Reeza combining two datasets
on the dataset2 I have exactly same records and on dataset1, the difference in between the two is that dataset1 has several times more records, and all of the records on dataset2 would be found on dataset1.
I do not need to check if any of the records are on both or only one dataset, but simply combine the two dataset and have a new variable created which would specify either record is coming from dataset1 or dataset2.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@_MVB_ wrote:
@Reeza combining two datasets
on the dataset2 I have exactly same records and on dataset1, the difference in between the two is that dataset1 has several times more records, and all of the records on dataset2 would be found on dataset1.
I do not need to check if any of the records are on both or only one dataset, but simply combine the two dataset and have a new variable created which would specify either record is coming from dataset1 or dataset2.
Did you run it and try it?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Reeza it works, however I do not want to have a work.database1 or work.database2 in my output dataset. I can, of course put an extra line in that procedure where specify if work.database1 then A, else if work.database2 then B, but isn't there another way just to specify either A or B without IF THEN statement?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@_MVB_ wrote:
@Reeza it works, however I do not want to have a work.database1 or work.database2 in my output dataset. I can, of course put an extra line in that procedure where specify if work.database1 then A, else if work.database2 then B, but isn't there another way just to specify either A or B without IF THEN statement?
May be we get faster to your desired result using sample data.
With below sample data how should the desired result look like and how does it differ from what below code generates?
data a;
do i=1 to 10;
output;
end;
stop;
run;
data b;
set a;
if i in (1,5,7);
run;
data want;
length _inds inds $41;
set a b indsname=_inds;
inds=_inds;
run;
proc print data=want;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In the end, I modified solution proposed by @Reeza just adding IF THEN statement to replace work.DATASET1 and work.DATASET2 fields with A or B.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the feedback.
Please mark one of @Reeza's answers as solution so that this discussion gets closed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
set have (in=a1) have2 (in=a2);
source = ifn(a1=1, 'A', 'B');
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Reeza Thanks!
This is what I was looking for in terms of the short solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It would help to show the code you are currently using.
The dataset option IN= creates a variable indicating that a record came from that dataset
Pseudo example:
data garbage; merge set1 (in=in1) set2 (in=in2) ; if in1 and in2 then source='Both'; else if in1 then source = 'set1'; else if in2 then source ='set2'; run;
I have to assume your merge would have some by variable(s) but that wouldn't make a difference in the options and if/then/else code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is what I have now:
- dataset1 has 10,000 records
- dataset2 has 1,000 records
data dataset1;
set dataset1;
format ds $1.;
ds='A';
run;
data dataset2;
set dataset2;
format ds $1.
ds='B';
run;
data merged;
set dataset1 dataset2;
run;
Final 'merged' dataset has 11,000 records with one new variable 'ds' which has 'A' for the 10,000 records and 'B' for other 1,000 records.