topic Re: Creating dataset dropping all lines of IDs which are not listed in another dataset in SAS Programming

Creating dataset dropping all lines of IDs which are not listed in another dataset

jspoend — Tue, 01 Feb 2022 08:35:29 GMT

Hi there,

I have two datasets, that I need to combine in a specific way, and I can't seem to get what I need. Any help greatly appreciated. I'm not sure if I'm using the wrong command here....

What I need to do is: I want to create a dataset, which contains all lines of dataset B, which start with an ID that is also listed in dataset A. I created the 'identify' variable to check that only lines from dataset B are in my dataset (dataset_C).

My two datasets are structured as follows:

Dataset_A

ID Drug Dose Date Identify

1 a b c 1

3 d e f 1

5 g h e 1

Dataset_B

ID Drug Dose Date Identify

1 a b c 2

1 c c c 2

2 o p q 2

3 d e f 2

3 b b b 2

4 u v w 2

5 g h e 2

I used this command:

proc sort data=dataset A;
by id;
run;

proc sort data=dataset B;
by id;
run;

data dataset_C;
merge dataset_A (in = in1) dataset_B (in = in2);
by id;
if in 1 and in2;
run;

However: when I check dataset_C there are no duplicate lines, but some lines have identify=1 and some have identify=2. How can I get it to only overwrite lines from dataset_A. Or is there a better way to program this, without using the merge command?

So what I want to get is:

Dataset_C

ID Drug Dose Date Identify

1 a b c 2

1 c c c 2

3 d e f 2

3 b b b 2

5 g h e 2

But what I currently get is this: there are just some random identify=1 inthere (not systematically in all lines from Dataset_A though)

Dataset_C

ID Drug Dose Date Identify

1 a b c 1

1 c c c 2

3 d e f 2

3 b b b 2

5 g h e 1

I've tried using the data set command instead, which didn't work. I also tried dropping all variables except for ID and identify from dataset_A, but that also did not work.

Many thanks in advance! Julia

Your second suggestion works like a charm

jspoend — Tue, 01 Feb 2022 10:27:05 GMT

Hi Kurt,

Thank you so much, your second suggetion works like a charm. I was not aware of this command.

Thank you so much for the quick reply and apologies, I did not meant to repost my question. I had some error messages when trying to submit it, which may cause this. I'll try to delete them.

Best and thanks so much again, Julia

Re: Creating dataset dropping all lines of IDs which are not listed in another dataset

Kurt_Bremser — Tue, 01 Feb 2022 08:44:07 GMT

Since you only need ID for your check, make sure that only this variable is read:

merge
  dataset_A (
    in = in1
    keep = id
  )
  dataset_B (
    in = in2
  )
;
if in1 and in2;

otherwise values from dataset_b will overwrite values from dataset_a.

Edit: moved the KEEP= option to dataset_a.

Re: Creating dataset dropping all lines of IDs which are not listed in another dataset

Kurt_Bremser — Tue, 01 Feb 2022 08:48:19 GMT

With a (rather) recent addition to the SAS data step tools, you can avoid doing a sort:

data dataset_c;
set dataset_b;
if _n_ = 1
then do;
  declare hash a (dataset:"dataset_a");
  a.definekey("id");
  a.definedone();
end;
if a.check() = 0;
run;

Re: Create dataset of IDs that are listed in another dataset

Kurt_Bremser — Tue, 01 Feb 2022 09:09:50 GMT

Please do not double-post.