Problem with SAS Match merging interleaving the data sets

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

Problem with SAS Match merging interleaving the data sets

[ Edited ]

Hello,

 

 I am trying to merge three separate SAS datasets by two variables (ID Date)  that are present in all three datasets. Two of three datasets successfully match merge. However, when I try to merge the third dataset in, instead of doing a match merge, it interleaves the datasets. I have checked the formatting and lengths of the variables in the three datasets using proc contents and they appear to be the same. I even tried to redefine the format, length, and label for each dataset to ensure they were identical and I am still not having luck. Any suggestions? Below is an example of my merge code. Dataset 1 and 2 (infect.weights and infect.temperature) contain four variables (ID, Group, Date,Weight/Temperature). Dataset 2 (infect.counts) contains nine variables (ID, Date, X1-X7).

 

 

DATA infect.weights;
 SET infect.weights;
 ATTRIB ID LENGTH=8 FORMAT=8. LABEL="ID";
 ATTRIB DATE LENGTH=8 FORMAT=MMDDYY10. LABEL="Date";
RUN;

DATA infect.temperature;
 SET infect.temperature;
 ATTRIB ID LENGTH=8 FORMAT=8. LABEL="ID";
 ATTRIB DATE LENGTH=8 FORMAT=MMDDYY10. LABEL="Date";
RUN;

DATA infect.cell;
 SET infect.qscout;
 
 ATTRIB ID LENGTH=8 FORMAT=8. LABEL="ID";
 ATTRIB DATE LENGTH=8 FORMAT=MMDDYY10. LABEL="Date";

RUN;

DATA parameters;
	MERGE infect.weights infect.temperature;
	BY ID Date;
RUN;

DATA parameters2;
	MERGE parameters infect.counts;
	BY ID Date;
RUN;

/* I have also tried to perform the merge within one data step and get the same results */ 

 

 

My Results end up looking like this:

 

ID    Group   Date                 Weight   Temp   X1    X2      X3    X4    X5    X6    X7
1          1      01/01/2001         100    100     .      .     .     .     .     .      .
1          1      01/01/2001            .     .     1      2     3     4     5     6      7 
2          1      01/01/2001         125    101     .      .      .    .     .     .      .
2          1      01/01/2001            .     .     8      9     10    11    12    13    14
 

Thanks in advance!

 


Accepted Solutions
Solution
‎03-29-2018 08:51 AM
Super User
Posts: 13,583

Re: Problem with SAS Match merging interleaving the data sets

[ Edited ]

The result you show indicates that the internal stored value of either ID or Date (or possibly both) varies somehow. If the variables are character look for blank characters such as a null at the end or beginning of the value.

If numeric, check that there are not decimals in values that aren't displayed. Note that date values with decimals and assigned one of the SAS date formats will happily ignore the decimal portion for display but use the entire value for comparisons such as merge or logic tests.

 

See:

data example;
   date= '01Jan2018'd + 0.01;
   format date mmddyy10.;
run;
data example2;
   date= '01Jan2018'd ;
   format date mmddyy10.;
run;

data merged;
   merge example example2;
   by date;
run;

I might try proc freq to display the ID or Date value using a BEST12. or similar to see if decimals are in your data. If so you might try rounding the values in a data step prior to the merge. 

 

View solution in original post


All Replies
Super User
Posts: 23,776

Re: Problem with SAS Match merging interleaving the data sets

What does the log show?

 

Spoiler

@HHutch wrote:

Hello,

 

 I am trying to merge three separate SAS datasets by two variables (ID Date)  that are present in all three datasets. Two of three datasets successfully match merge. However, when I try to merge the third dataset in, instead of doing a match merge, it interleaves the datasets. I have checked the formatting and lengths of the variables in the three datasets using proc contents and they appear to be the same. I even tried to redefine the format, length, and label for each dataset to ensure they were identical and I am still not having luck. Any suggestions? Below is an example of my merge code. Dataset 1 and 2 (infect.weights and infect.temperature) contain four variables (ID, Group, Date,Weight/Temperature). Dataset 2 (infect.counts) contains nine variables (ID, Date, X1-X7).

 

 

DATA infect.weights;
 SET infect.weights;
 ATTRIB ID LENGTH=8 FORMAT=8. LABEL="ID";
 ATTRIB DATE LENGTH=8 FORMAT=MMDDYY10. LABEL="Date";
RUN;

DATA infect.temperature;
 SET infect.temperature;
 ATTRIB ID LENGTH=8 FORMAT=8. LABEL="ID";
 ATTRIB DATE LENGTH=8 FORMAT=MMDDYY10. LABEL="Date";
RUN;

DATA infect.cell;
 SET infect.qscout;
 
 ATTRIB ID LENGTH=8 FORMAT=8. LABEL="ID";
 ATTRIB DATE LENGTH=8 FORMAT=MMDDYY10. LABEL="Date";

RUN;

DATA parameters;
	MERGE infect.weights infect.temperature;
	BY ID Date;
RUN;

DATA parameters2;
	MERGE parameters infect.counts;
	BY ID Date;
RUN;

/* I have also tried to perform the merge within one data step and get the same results */ 

 

 

My Results end up looking like this:

 

ID    Group   Date                 Weight   Temp   X1    X2      X3    X4    X5    X6    X7
1          1      01/01/2001         100    100     .      .     .     .     .     .      .
1          1      01/01/2001            .     .     1      2     3     4     5     6      7 
2          1      01/01/2001         125    101     .      .      .    .     .     .      .
2          1      01/01/2001            .     .     8      9     10    11    12    13    14
 

Thanks in advance!

 


Occasional Contributor
Posts: 10

Re: Problem with SAS Match merging interleaving the data sets

The log shows 


327 DATA parameters2;
328 MERGE parameters infect.qscout;
329 BY ID Date;
330 RUN;

NOTE: There were 375 observations read from the data set WORK.PARAMETERS.
NOTE: There were 233 observations read from the data set INFECT.QSCOUT.
NOTE: The data set WORK.PARAMETERS2 has 608 observations and 12 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

Solution
‎03-29-2018 08:51 AM
Super User
Posts: 13,583

Re: Problem with SAS Match merging interleaving the data sets

[ Edited ]

The result you show indicates that the internal stored value of either ID or Date (or possibly both) varies somehow. If the variables are character look for blank characters such as a null at the end or beginning of the value.

If numeric, check that there are not decimals in values that aren't displayed. Note that date values with decimals and assigned one of the SAS date formats will happily ignore the decimal portion for display but use the entire value for comparisons such as merge or logic tests.

 

See:

data example;
   date= '01Jan2018'd + 0.01;
   format date mmddyy10.;
run;
data example2;
   date= '01Jan2018'd ;
   format date mmddyy10.;
run;

data merged;
   merge example example2;
   by date;
run;

I might try proc freq to display the ID or Date value using a BEST12. or similar to see if decimals are in your data. If so you might try rounding the values in a data step prior to the merge. 

 

Occasional Contributor
Posts: 10

Re: Problem with SAS Match merging interleaving the data sets

You were right! The one file's date contains values after the decimal point. I didn't realize the dataset it was being pulled from used a date/time format.

 

Thank you for your help! It is greatly appreciated!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 85 views
  • 1 like
  • 3 in conversation