BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
HHutch
Calcite | Level 5

Hello,

 

 I am trying to merge three separate SAS datasets by two variables (ID Date)  that are present in all three datasets. Two of three datasets successfully match merge. However, when I try to merge the third dataset in, instead of doing a match merge, it interleaves the datasets. I have checked the formatting and lengths of the variables in the three datasets using proc contents and they appear to be the same. I even tried to redefine the format, length, and label for each dataset to ensure they were identical and I am still not having luck. Any suggestions? Below is an example of my merge code. Dataset 1 and 2 (infect.weights and infect.temperature) contain four variables (ID, Group, Date,Weight/Temperature). Dataset 2 (infect.counts) contains nine variables (ID, Date, X1-X7).

 

 

DATA infect.weights;
 SET infect.weights;
 ATTRIB ID LENGTH=8 FORMAT=8. LABEL="ID";
 ATTRIB DATE LENGTH=8 FORMAT=MMDDYY10. LABEL="Date";
RUN;

DATA infect.temperature;
 SET infect.temperature;
 ATTRIB ID LENGTH=8 FORMAT=8. LABEL="ID";
 ATTRIB DATE LENGTH=8 FORMAT=MMDDYY10. LABEL="Date";
RUN;

DATA infect.cell;
 SET infect.qscout;
 
 ATTRIB ID LENGTH=8 FORMAT=8. LABEL="ID";
 ATTRIB DATE LENGTH=8 FORMAT=MMDDYY10. LABEL="Date";

RUN;

DATA parameters;
	MERGE infect.weights infect.temperature;
	BY ID Date;
RUN;

DATA parameters2;
	MERGE parameters infect.counts;
	BY ID Date;
RUN;

/* I have also tried to perform the merge within one data step and get the same results */ 

 

 

My Results end up looking like this:

 

ID    Group   Date                 Weight   Temp   X1    X2      X3    X4    X5    X6    X7
1          1      01/01/2001         100    100     .      .     .     .     .     .      .
1          1      01/01/2001            .     .     1      2     3     4     5     6      7 
2          1      01/01/2001         125    101     .      .      .    .     .     .      .
2          1      01/01/2001            .     .     8      9     10    11    12    13    14
 

Thanks in advance!

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

The result you show indicates that the internal stored value of either ID or Date (or possibly both) varies somehow. If the variables are character look for blank characters such as a null at the end or beginning of the value.

If numeric, check that there are not decimals in values that aren't displayed. Note that date values with decimals and assigned one of the SAS date formats will happily ignore the decimal portion for display but use the entire value for comparisons such as merge or logic tests.

 

See:

data example;
   date= '01Jan2018'd + 0.01;
   format date mmddyy10.;
run;
data example2;
   date= '01Jan2018'd ;
   format date mmddyy10.;
run;

data merged;
   merge example example2;
   by date;
run;

I might try proc freq to display the ID or Date value using a BEST12. or similar to see if decimals are in your data. If so you might try rounding the values in a data step prior to the merge. 

 

View solution in original post

4 REPLIES 4
Reeza
Super User

What does the log show?

 

Spoiler

@HHutch wrote:

Hello,

 

 I am trying to merge three separate SAS datasets by two variables (ID Date)  that are present in all three datasets. Two of three datasets successfully match merge. However, when I try to merge the third dataset in, instead of doing a match merge, it interleaves the datasets. I have checked the formatting and lengths of the variables in the three datasets using proc contents and they appear to be the same. I even tried to redefine the format, length, and label for each dataset to ensure they were identical and I am still not having luck. Any suggestions? Below is an example of my merge code. Dataset 1 and 2 (infect.weights and infect.temperature) contain four variables (ID, Group, Date,Weight/Temperature). Dataset 2 (infect.counts) contains nine variables (ID, Date, X1-X7).

 

 

DATA infect.weights;
 SET infect.weights;
 ATTRIB ID LENGTH=8 FORMAT=8. LABEL="ID";
 ATTRIB DATE LENGTH=8 FORMAT=MMDDYY10. LABEL="Date";
RUN;

DATA infect.temperature;
 SET infect.temperature;
 ATTRIB ID LENGTH=8 FORMAT=8. LABEL="ID";
 ATTRIB DATE LENGTH=8 FORMAT=MMDDYY10. LABEL="Date";
RUN;

DATA infect.cell;
 SET infect.qscout;
 
 ATTRIB ID LENGTH=8 FORMAT=8. LABEL="ID";
 ATTRIB DATE LENGTH=8 FORMAT=MMDDYY10. LABEL="Date";

RUN;

DATA parameters;
	MERGE infect.weights infect.temperature;
	BY ID Date;
RUN;

DATA parameters2;
	MERGE parameters infect.counts;
	BY ID Date;
RUN;

/* I have also tried to perform the merge within one data step and get the same results */ 

 

 

My Results end up looking like this:

 

ID    Group   Date                 Weight   Temp   X1    X2      X3    X4    X5    X6    X7
1          1      01/01/2001         100    100     .      .     .     .     .     .      .
1          1      01/01/2001            .     .     1      2     3     4     5     6      7 
2          1      01/01/2001         125    101     .      .      .    .     .     .      .
2          1      01/01/2001            .     .     8      9     10    11    12    13    14
 

Thanks in advance!

 


HHutch
Calcite | Level 5

The log shows 


327 DATA parameters2;
328 MERGE parameters infect.qscout;
329 BY ID Date;
330 RUN;

NOTE: There were 375 observations read from the data set WORK.PARAMETERS.
NOTE: There were 233 observations read from the data set INFECT.QSCOUT.
NOTE: The data set WORK.PARAMETERS2 has 608 observations and 12 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

ballardw
Super User

The result you show indicates that the internal stored value of either ID or Date (or possibly both) varies somehow. If the variables are character look for blank characters such as a null at the end or beginning of the value.

If numeric, check that there are not decimals in values that aren't displayed. Note that date values with decimals and assigned one of the SAS date formats will happily ignore the decimal portion for display but use the entire value for comparisons such as merge or logic tests.

 

See:

data example;
   date= '01Jan2018'd + 0.01;
   format date mmddyy10.;
run;
data example2;
   date= '01Jan2018'd ;
   format date mmddyy10.;
run;

data merged;
   merge example example2;
   by date;
run;

I might try proc freq to display the ID or Date value using a BEST12. or similar to see if decimals are in your data. If so you might try rounding the values in a data step prior to the merge. 

 

HHutch
Calcite | Level 5

You were right! The one file's date contains values after the decimal point. I didn't realize the dataset it was being pulled from used a date/time format.

 

Thank you for your help! It is greatly appreciated!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 698 views
  • 1 like
  • 3 in conversation