Hi there,
I have two datasets, that I need to combine in a specific way, and I can't seem to get what I need. Any help greatly appreciated. I'm not sure if I'm using the wrong command here....
What I need to do is: I want to create a dataset, which contains all lines of dataset B, which start with an ID that is also listed in dataset A. I created the 'identify' variable to check that only lines from dataset B are in my dataset (dataset_C).
My two datasets are structured as follows:
Dataset_A
ID Drug Dose Date Identify
1 a b c 1
3 d e f 1
5 g h e 1
Dataset_B
ID Drug Dose Date Identify
1 a b c 2
1 c c c 2
2 o p q 2
3 d e f 2
3 b b b 2
4 u v w 2
5 g h e 2
I used this command:
proc sort data=dataset A;
by id;
run;
proc sort data=dataset B;
by id;
run;
data dataset_C;
merge dataset_A (in = in1) dataset_B (in = in2);
by id;
if in 1 and in2;
run;
However: when I check dataset_C there are no duplicate lines, but some lines have identify=1 and some have identify=2. How can I get it to only overwrite lines from dataset_A. Or is there a better way to program this, without using the merge command?
So what I want to get is:
Dataset_C
ID Drug Dose Date Identify
1 a b c 2
1 c c c 2
3 d e f 2
3 b b b 2
5 g h e 2
But what I currently get is this: there are just some random identify=1 inthere (not systematically in all lines from Dataset_A though)
Dataset_C
ID Drug Dose Date Identify
1 a b c 1
1 c c c 2
3 d e f 2
3 b b b 2
5 g h e 1
I've tried using the data set command instead, which didn't work. I also tried dropping all variables except for ID and identify from dataset_A, but that also did not work.
Many thanks in advance! Julia
With a (rather) recent addition to the SAS data step tools, you can avoid doing a sort:
data dataset_c;
set dataset_b;
if _n_ = 1
then do;
declare hash a (dataset:"dataset_a");
a.definekey("id");
a.definedone();
end;
if a.check() = 0;
run;
Hi Kurt,
Thank you so much, your second suggetion works like a charm. I was not aware of this command.
Thank you so much for the quick reply and apologies, I did not meant to repost my question. I had some error messages when trying to submit it, which may cause this. I'll try to delete them.
Best and thanks so much again, Julia
Since you only need ID for your check, make sure that only this variable is read:
merge
dataset_A (
in = in1
keep = id
)
dataset_B (
in = in2
)
;
if in1 and in2;
otherwise values from dataset_b will overwrite values from dataset_a.
Edit: moved the KEEP= option to dataset_a.
With a (rather) recent addition to the SAS data step tools, you can avoid doing a sort:
data dataset_c;
set dataset_b;
if _n_ = 1
then do;
declare hash a (dataset:"dataset_a");
a.definekey("id");
a.definedone();
end;
if a.check() = 0;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.