Hello,
I would like to merge 2 dataset. data A has a value for my region= RRE, AAA, OTHER etc and data B has region = RRE, AAA etc. Data B does NOT have the OTHER but data A has both OTHER and the correct value.
After merging, I want my final data to have only the correct value from data B instead of the OTHER from data A.
data A;
input Region $ 1-5 Installation $;
datalines;
RRE USS
OTHER USS
OTHER USS
AAA USTU
AAA USTU
OTHER USS
OTHER USTU
OTHER UNSP
OTHER UNSP
OTHER UNSP
; run;
data B ;
input Region $ 1-5 Installation $;
datalines;
RRE USS
RRE USS
RRE USS
AAA USTU
AAA USTU
AAA USTU
OTHER UNSP
OTHER UNSP
OTHER UNSP
; run;
output
data C
RRE | USS |
RRE | USS |
RRE | USS |
RRE | USS |
AAA | USTU |
AAA | USTU |
AAA | USTU |
OTHER | UNSP |
OTHER | UNSP |
OTHER | UNSP |
My code is not giving the expected output. I also used the line which I commented out but still did not get the expected output C
data C;
merge A (in=in1 )
B (in=in2)
if in2 then REGION = in1;
*if REGION = 'OTHER' and installation ne 'UNSP' then REGION = 'in2';
run;
In the example you have provided, you don't need the data set A at all. You coudl simply forget about merging and just use:
data C;
set B;
run;
Please include an example where the data set A is needed.
@Astounding @Amir Sorry for the mix-up. I have updated the message. In the output, i want the number of records to be 10 records not 9.
Hi,
Please confirm that you expect data set A to be the same as what the following code creates:
data A;
input
Region : $char5.
Installation $char50.
;
datalines;
RRE USS OTHER USS OTHER USS
AAA USTU
AAA USTU
OTHER USTU OTHER UNSP OTHER UNSP OTHER UNSP
;
If data set A should be different then please clarify by editing the above code.
Lastly, please clarify what you expect the output data set to look like by sharing code like the above, but for your output based on the input data sets A & B that you have posted.
Thanks & kind regards,
Amir.
Edit: Clarification.
Your MERGE statement is missing the semicolon.
You do have any BY statement . How is the data step supposed to know which observations from A should be combined with which observations from B?
Both datasets have the exact same variables. So merging them in that way will overwrite the values from A with the values from B until B runs out of observations. And if you don't want that extra observation from A then why bother using it at all?
Hi,
Thanks for the updates. Please respond explaining what the logical rules are for bringing the two data sets together.
For example, RRE is in data set A once and 3 times in data set B then 4 times in the output; is this 1 from A and 3 from B?
Also, AAA appears twice in data set A and three times in data set B then 3 times in the output; is this only from B?
Lastly, your question says "Data B does NOT have the OTHER ...", but there are 3 OTHER records in B. Please clarify.
Thanks & kind regards,
Amir.
Below two possible options.
/* option 1 */
data want_1;
if _n_=1 then
do;
dcl hash h1(dataset:'a');
h1.defineKey('Installation');
h1.defineData('Region');
h1.defineDone();
end;
set b;
_rc=h1.find();
drop _rc;
run;
proc print data=want_1;
run;
/* option 2 */
proc sort nodupkey data=a out=a_unique;
by Installation;
run;
proc sort data=b;
by Installation;
run;
data want_2;
merge b a_unique;
by Installation;
run;
proc print data=want_2;
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.