BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Hello_there
Lapis Lazuli | Level 10

Hi, I'm having log message pop up that says that one of my data sets im trying to merge has repeat values.

I'm trying to pull specific subjid/param combos (in the Have data set below) from the main data set that i'm trying to explore, and the message pops up. I would like to avoid this message.

Thanks

data have;  * unique subjid/param combos list;
infile datalines dsd dlm=",";
	input subjid $ param $;
datalines;
001, red
002, orange
002, yellow
002, green
003, indigo
003, purple
;
run;
proc sort; by subjid, param; run;
data main; * main data set i'm trying to explore;
infile datalines dsd dlm=",";
	input subjid $ param $ visit $ value;
datalines;
001, red, first, 2
001, orange, first, 2
001, yellow, first, 2
001, yellow, second, 3
002, yellow, first, 4
002, green, second, 4
002, green, third, 5
002, indigo, first, 1
003, red, second, 2
003, blue, first, 2
003, indigo, first, 3
003, indigo, second, 4
003, indigo, third, 4
003, purple, first, 5
003, purple, fifth, 5
;
run;
proc sort; by subjid param; run;

desired output:

Hello_there_0-1668455036805.png

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The code you posted is not going to generate such a message.

What code did you try that generated the message?

 

If you merge the data you have you won't get that message either since the BY variables uniquely identify the data the HAVE dataset.

473  data want;
474    merge have(in=in1) main(in=in2);
475    by subjid param;
476    in_have=in1;
477    in_main=in2;
478  run;

NOTE: There were 6 observations read from the data set WORK.HAVE.
NOTE: There were 15 observations read from the data set WORK.MAIN.
NOTE: The data set WORK.WANT has 16 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
Obs    subjid    param     visit     value    in_have    in_main

  1     001      orange    first       2         0          1
  2     001      red       first       2         1          1
  3     001      yellow    first       2         0          1
  4     001      yellow    second      3         0          1
  5     002      green     second      4         1          1
  6     002      green     third       5         1          1
  7     002      indigo    first       1         0          1
  8     002      orange                .         1          0
  9     002      yellow    first       4         1          1
 10     003      blue      first       2         0          1
 11     003      indigo    first       3         1          1
 12     003      indigo    second      4         1          1
 13     003      indigo    third       4         1          1
 14     003      purple    first       5         1          1
 15     003      purple    fifth       5         1          1
 16     003      red       second      2         0          1

View solution in original post

8 REPLIES 8
Quentin
Super User

Can you show the MERGE code you tried?  Are you merging BY SubjID Param?

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
Hello_there
Lapis Lazuli | Level 10
Hi Quentin, I have my merged code on another computer, and the real MAIN data sets includes almost 40 variables that i'm trying to include.
So to picture, the MAIN data set has multiple values of the combination of SUBJID and PARAM because there are multiple visits. But i'm hoping to grab all those rows that include every visit.

Yes, i am merging by SUBJID PARAM.
Tom
Super User Tom
Super User

The code you posted is not going to generate such a message.

What code did you try that generated the message?

 

If you merge the data you have you won't get that message either since the BY variables uniquely identify the data the HAVE dataset.

473  data want;
474    merge have(in=in1) main(in=in2);
475    by subjid param;
476    in_have=in1;
477    in_main=in2;
478  run;

NOTE: There were 6 observations read from the data set WORK.HAVE.
NOTE: There were 15 observations read from the data set WORK.MAIN.
NOTE: The data set WORK.WANT has 16 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
Obs    subjid    param     visit     value    in_have    in_main

  1     001      orange    first       2         0          1
  2     001      red       first       2         1          1
  3     001      yellow    first       2         0          1
  4     001      yellow    second      3         0          1
  5     002      green     second      4         1          1
  6     002      green     third       5         1          1
  7     002      indigo    first       1         0          1
  8     002      orange                .         1          0
  9     002      yellow    first       4         1          1
 10     003      blue      first       2         0          1
 11     003      indigo    first       3         1          1
 12     003      indigo    second      4         1          1
 13     003      indigo    third       4         1          1
 14     003      purple    first       5         1          1
 15     003      purple    fifth       5         1          1
 16     003      red       second      2         0          1
Hello_there
Lapis Lazuli | Level 10
Hi Tom,
I did it this way, except i called them "a" and "b" respectively, and said if "a" and "b"
Hello_there
Lapis Lazuli | Level 10
I'm wondering if i could avoid the message by using PROC sql instead, but i'm not too familiar w/ PROC SQL.
Hello_there
Lapis Lazuli | Level 10
After double checking my code, i noticed that one of the input data sets had multiple instances of the same combination of subjid/param. Once i sorted w/ the nodup option, it eliminated any repeats, and the merge worked.

Thanks for helping!
Tom
Super User Tom
Super User

If both dataset have duplicate observations for some SUBJID and PARAM value then what would merging them even mean?

 

Perhaps you just want to interleave the observations instead.

data want;
  set main have;
  by subjid param;
run;
Hello_there
Lapis Lazuli | Level 10
What had happened was I forgot to delete duplicate observations in the data set that included only subjid and param in my real work (not the example in this thread).

Once I did, your code worked for me.

Thanks again!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 998 views
  • 0 likes
  • 3 in conversation