BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jspoend
Obsidian | Level 7

Hi there, 

 

I have two datasets, that I need to combine in a specific way, and I can't seem to get what I need. Any help greatly appreciated. I'm not sure if I'm using the wrong command here....

 

What I need to do is: I want to create a dataset, which contains all lines of dataset B, which start with an ID that is also listed in dataset A. I created the 'identify' variable to check that only lines from dataset B are in my dataset (dataset_C). 

 

My two datasets are structured as follows: 

 

Dataset_A

ID Drug Dose Date Identify

1 a b c 1

3 d e f 1

5 g h e 1

 

Dataset_B

ID Drug Dose Date Identify

1 a b c 2

1 c c c 2

2 o p q 2

3 d e f 2

3 b b b 2

4 u v w 2

5 g h e 2

 

I used this command:

 

 

proc sort data=dataset A;
by id;
run;

proc sort data=dataset B;
by id;
run;

data dataset_C;
merge dataset_A (in = in1) dataset_B (in = in2);
by id;
if in 1 and in2;
run;

 

However: when I check dataset_C there are no duplicate lines, but some lines have identify=1 and some have identify=2. How can I get it to only overwrite lines from dataset_A. Or is there a better way to program this, without using the merge command? 

 

So what I want to get is: 

 

Dataset_C

ID Drug Dose Date Identify

1 a b c 2

1 c c c 2

3 d e f 2

3 b b b 2

5 g h e 2

 

But what I currently get is this: there are just some random identify=1 inthere (not systematically in all lines from Dataset_A though)

 

Dataset_C

ID Drug Dose Date Identify

1 a b c 1

1 c c c 2

3 d e f 2

3 b b b 2

5 g h e 1

 

I've tried using the data set command instead, which didn't work. I also tried dropping all variables except for ID and identify from dataset_A, but that also did not work. 

 

Many thanks in advance! Julia

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

With a (rather) recent addition to the SAS data step tools, you can avoid doing a sort:

data dataset_c;
set dataset_b;
if _n_ = 1
then do;
  declare hash a (dataset:"dataset_a");
  a.definekey("id");
  a.definedone();
end;
if a.check() = 0;
run;

View solution in original post

4 REPLIES 4
jspoend
Obsidian | Level 7

Hi Kurt, 

 

Thank you so much, your second suggetion works like a charm. I was not aware of this command. 

 

Thank you so much for the quick reply and apologies, I did not meant to repost my question. I had some error messages when trying to submit it, which may cause this. I'll try to delete them. 

 

Best and thanks so much again, Julia

Kurt_Bremser
Super User

Since you only need ID for your check, make sure that only this variable is read:

merge
  dataset_A (
    in = in1
    keep = id
  )
  dataset_B (
    in = in2
  )
;
if in1 and in2;

otherwise values from dataset_b will overwrite values from dataset_a.

 

Edit: moved the KEEP= option to dataset_a.

Kurt_Bremser
Super User

With a (rather) recent addition to the SAS data step tools, you can avoid doing a sort:

data dataset_c;
set dataset_b;
if _n_ = 1
then do;
  declare hash a (dataset:"dataset_a");
  a.definekey("id");
  a.definedone();
end;
if a.check() = 0;
run;
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1333 views
  • 3 likes
  • 2 in conversation