Solved: Re: How do I avoid the Merge warning for repeat values in a data set w...

Hello_there · Posted 11-14-2022 02:44 PM

Hi, I'm having log message pop up that says that one of my data sets im trying to merge has repeat values.

I'm trying to pull specific subjid/param combos (in the Have data set below) from the main data set that i'm trying to explore, and the message pops up. I would like to avoid this message.

Thanks

data have;  * unique subjid/param combos list;
infile datalines dsd dlm=",";
	input subjid $ param $;
datalines;
001, red
002, orange
002, yellow
002, green
003, indigo
003, purple
;
run;
proc sort; by subjid, param; run;

data main; * main data set i'm trying to explore;
infile datalines dsd dlm=",";
	input subjid $ param $ visit $ value;
datalines;
001, red, first, 2
001, orange, first, 2
001, yellow, first, 2
001, yellow, second, 3
002, yellow, first, 4
002, green, second, 4
002, green, third, 5
002, indigo, first, 1
003, red, second, 2
003, blue, first, 2
003, indigo, first, 3
003, indigo, second, 4
003, indigo, third, 4
003, purple, first, 5
003, purple, fifth, 5
;
run;
proc sort; by subjid param; run;

desired output:

Tom · Posted 11-14-2022 02:55 PM

The code you posted is not going to generate such a message.

What code did you try that generated the message?

If you merge the data you have you won't get that message either since the BY variables uniquely identify the data the HAVE dataset.

473  data want;
474    merge have(in=in1) main(in=in2);
475    by subjid param;
476    in_have=in1;
477    in_main=in2;
478  run;

NOTE: There were 6 observations read from the data set WORK.HAVE.
NOTE: There were 15 observations read from the data set WORK.MAIN.
NOTE: The data set WORK.WANT has 16 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

Obs    subjid    param     visit     value    in_have    in_main

  1     001      orange    first       2         0          1
  2     001      red       first       2         1          1
  3     001      yellow    first       2         0          1
  4     001      yellow    second      3         0          1
  5     002      green     second      4         1          1
  6     002      green     third       5         1          1
  7     002      indigo    first       1         0          1
  8     002      orange                .         1          0
  9     002      yellow    first       4         1          1
 10     003      blue      first       2         0          1
 11     003      indigo    first       3         1          1
 12     003      indigo    second      4         1          1
 13     003      indigo    third       4         1          1
 14     003      purple    first       5         1          1
 15     003      purple    fifth       5         1          1
 16     003      red       second      2         0          1

View solution in original post

Quentin · Posted 11-14-2022 02:51 PM

Can you show the MERGE code you tried? Are you merging BY SubjID Param?

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.

Hello_there · Posted 11-14-2022 02:54 PM

Hi Quentin, I have my merged code on another computer, and the real MAIN data sets includes almost 40 variables that i'm trying to include.
So to picture, the MAIN data set has multiple values of the combination of SUBJID and PARAM because there are multiple visits. But i'm hoping to grab all those rows that include every visit.

Yes, i am merging by SUBJID PARAM.

Tom · Posted 11-14-2022 02:55 PM

The code you posted is not going to generate such a message.

What code did you try that generated the message?

If you merge the data you have you won't get that message either since the BY variables uniquely identify the data the HAVE dataset.

473  data want;
474    merge have(in=in1) main(in=in2);
475    by subjid param;
476    in_have=in1;
477    in_main=in2;
478  run;

NOTE: There were 6 observations read from the data set WORK.HAVE.
NOTE: There were 15 observations read from the data set WORK.MAIN.
NOTE: The data set WORK.WANT has 16 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

Obs    subjid    param     visit     value    in_have    in_main

  1     001      orange    first       2         0          1
  2     001      red       first       2         1          1
  3     001      yellow    first       2         0          1
  4     001      yellow    second      3         0          1
  5     002      green     second      4         1          1
  6     002      green     third       5         1          1
  7     002      indigo    first       1         0          1
  8     002      orange                .         1          0
  9     002      yellow    first       4         1          1
 10     003      blue      first       2         0          1
 11     003      indigo    first       3         1          1
 12     003      indigo    second      4         1          1
 13     003      indigo    third       4         1          1
 14     003      purple    first       5         1          1
 15     003      purple    fifth       5         1          1
 16     003      red       second      2         0          1

Hello_there · Posted 11-14-2022 03:03 PM

Hi Tom,
I did it this way, except i called them "a" and "b" respectively, and said if "a" and "b"

Hello_there · Posted 11-14-2022 03:04 PM

I'm wondering if i could avoid the message by using PROC sql instead, but i'm not too familiar w/ PROC SQL.

Hello_there · Posted 11-14-2022 03:22 PM

After double checking my code, i noticed that one of the input data sets had multiple instances of the same combination of subjid/param. Once i sorted w/ the nodup option, it eliminated any repeats, and the merge worked.

Thanks for helping!

Tom · Posted 11-14-2022 03:50 PM

If both dataset have duplicate observations for some SUBJID and PARAM value then what would merging them even mean?

Perhaps you just want to interleave the observations instead.

data want;
  set main have;
  by subjid param;
run;

Hello_there · Posted 11-14-2022 04:35 PM

What had happened was I forgot to delete duplicate observations in the data set that included only subjid and param in my real work (not the example in this thread).

Once I did, your code worked for me.

Thanks again!

How do I avoid the Merge warning for repeat values in a data set with this problem?

Re: How do I avoid the Merge warning for repeat values in a data set with this problem?

Re: How do I avoid the Merge warning for repeat values in a data set with this problem?

Re: How do I avoid the Merge warning for repeat values in a data set with this problem?

Re: How do I avoid the Merge warning for repeat values in a data set with this problem?

Re: How do I avoid the Merge warning for repeat values in a data set with this problem?

Re: How do I avoid the Merge warning for repeat values in a data set with this problem?

Re: How do I avoid the Merge warning for repeat values in a data set with this problem?

Re: How do I avoid the Merge warning for repeat values in a data set with this problem?

Re: How do I avoid the Merge warning for repeat values in a data set with this problem?

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away