Hi, I'm having log message pop up that says that one of my data sets im trying to merge has repeat values.
I'm trying to pull specific subjid/param combos (in the Have data set below) from the main data set that i'm trying to explore, and the message pops up. I would like to avoid this message.
Thanks
data have; * unique subjid/param combos list;
infile datalines dsd dlm=",";
input subjid $ param $;
datalines;
001, red
002, orange
002, yellow
002, green
003, indigo
003, purple
;
run;
proc sort; by subjid, param; run;
data main; * main data set i'm trying to explore;
infile datalines dsd dlm=",";
input subjid $ param $ visit $ value;
datalines;
001, red, first, 2
001, orange, first, 2
001, yellow, first, 2
001, yellow, second, 3
002, yellow, first, 4
002, green, second, 4
002, green, third, 5
002, indigo, first, 1
003, red, second, 2
003, blue, first, 2
003, indigo, first, 3
003, indigo, second, 4
003, indigo, third, 4
003, purple, first, 5
003, purple, fifth, 5
;
run;
proc sort; by subjid param; run;
desired output:
The code you posted is not going to generate such a message.
What code did you try that generated the message?
If you merge the data you have you won't get that message either since the BY variables uniquely identify the data the HAVE dataset.
473 data want; 474 merge have(in=in1) main(in=in2); 475 by subjid param; 476 in_have=in1; 477 in_main=in2; 478 run; NOTE: There were 6 observations read from the data set WORK.HAVE. NOTE: There were 15 observations read from the data set WORK.MAIN. NOTE: The data set WORK.WANT has 16 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
Obs subjid param visit value in_have in_main 1 001 orange first 2 0 1 2 001 red first 2 1 1 3 001 yellow first 2 0 1 4 001 yellow second 3 0 1 5 002 green second 4 1 1 6 002 green third 5 1 1 7 002 indigo first 1 0 1 8 002 orange . 1 0 9 002 yellow first 4 1 1 10 003 blue first 2 0 1 11 003 indigo first 3 1 1 12 003 indigo second 4 1 1 13 003 indigo third 4 1 1 14 003 purple first 5 1 1 15 003 purple fifth 5 1 1 16 003 red second 2 0 1
Can you show the MERGE code you tried? Are you merging BY SubjID Param?
The code you posted is not going to generate such a message.
What code did you try that generated the message?
If you merge the data you have you won't get that message either since the BY variables uniquely identify the data the HAVE dataset.
473 data want; 474 merge have(in=in1) main(in=in2); 475 by subjid param; 476 in_have=in1; 477 in_main=in2; 478 run; NOTE: There were 6 observations read from the data set WORK.HAVE. NOTE: There were 15 observations read from the data set WORK.MAIN. NOTE: The data set WORK.WANT has 16 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
Obs subjid param visit value in_have in_main 1 001 orange first 2 0 1 2 001 red first 2 1 1 3 001 yellow first 2 0 1 4 001 yellow second 3 0 1 5 002 green second 4 1 1 6 002 green third 5 1 1 7 002 indigo first 1 0 1 8 002 orange . 1 0 9 002 yellow first 4 1 1 10 003 blue first 2 0 1 11 003 indigo first 3 1 1 12 003 indigo second 4 1 1 13 003 indigo third 4 1 1 14 003 purple first 5 1 1 15 003 purple fifth 5 1 1 16 003 red second 2 0 1
If both dataset have duplicate observations for some SUBJID and PARAM value then what would merging them even mean?
Perhaps you just want to interleave the observations instead.
data want;
set main have;
by subjid param;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.