BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
HHutch
Calcite | Level 5

Hello,

 

 I am trying to merge three separate SAS datasets by two variables (ID Date)  that are present in all three datasets. Two of three datasets successfully match merge. However, when I try to merge the third dataset in, instead of doing a match merge, it interleaves the datasets. I have checked the formatting and lengths of the variables in the three datasets using proc contents and they appear to be the same. I even tried to redefine the format, length, and label for each dataset to ensure they were identical and I am still not having luck. Any suggestions? Below is an example of my merge code. Dataset 1 and 2 (infect.weights and infect.temperature) contain four variables (ID, Group, Date,Weight/Temperature). Dataset 2 (infect.counts) contains nine variables (ID, Date, X1-X7).

 

 

DATA infect.weights;
 SET infect.weights;
 ATTRIB ID LENGTH=8 FORMAT=8. LABEL="ID";
 ATTRIB DATE LENGTH=8 FORMAT=MMDDYY10. LABEL="Date";
RUN;

DATA infect.temperature;
 SET infect.temperature;
 ATTRIB ID LENGTH=8 FORMAT=8. LABEL="ID";
 ATTRIB DATE LENGTH=8 FORMAT=MMDDYY10. LABEL="Date";
RUN;

DATA infect.cell;
 SET infect.qscout;
 
 ATTRIB ID LENGTH=8 FORMAT=8. LABEL="ID";
 ATTRIB DATE LENGTH=8 FORMAT=MMDDYY10. LABEL="Date";

RUN;

DATA parameters;
	MERGE infect.weights infect.temperature;
	BY ID Date;
RUN;

DATA parameters2;
	MERGE parameters infect.counts;
	BY ID Date;
RUN;

/* I have also tried to perform the merge within one data step and get the same results */ 

 

 

My Results end up looking like this:

 

ID    Group   Date                 Weight   Temp   X1    X2      X3    X4    X5    X6    X7
1          1      01/01/2001         100    100     .      .     .     .     .     .      .
1          1      01/01/2001            .     .     1      2     3     4     5     6      7 
2          1      01/01/2001         125    101     .      .      .    .     .     .      .
2          1      01/01/2001            .     .     8      9     10    11    12    13    14
 

Thanks in advance!

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

The result you show indicates that the internal stored value of either ID or Date (or possibly both) varies somehow. If the variables are character look for blank characters such as a null at the end or beginning of the value.

If numeric, check that there are not decimals in values that aren't displayed. Note that date values with decimals and assigned one of the SAS date formats will happily ignore the decimal portion for display but use the entire value for comparisons such as merge or logic tests.

 

See:

data example;
   date= '01Jan2018'd + 0.01;
   format date mmddyy10.;
run;
data example2;
   date= '01Jan2018'd ;
   format date mmddyy10.;
run;

data merged;
   merge example example2;
   by date;
run;

I might try proc freq to display the ID or Date value using a BEST12. or similar to see if decimals are in your data. If so you might try rounding the values in a data step prior to the merge. 

 

View solution in original post

4 REPLIES 4
Reeza
Super User

What does the log show?

 

Spoiler

@HHutch wrote:

Hello,

 

 I am trying to merge three separate SAS datasets by two variables (ID Date)  that are present in all three datasets. Two of three datasets successfully match merge. However, when I try to merge the third dataset in, instead of doing a match merge, it interleaves the datasets. I have checked the formatting and lengths of the variables in the three datasets using proc contents and they appear to be the same. I even tried to redefine the format, length, and label for each dataset to ensure they were identical and I am still not having luck. Any suggestions? Below is an example of my merge code. Dataset 1 and 2 (infect.weights and infect.temperature) contain four variables (ID, Group, Date,Weight/Temperature). Dataset 2 (infect.counts) contains nine variables (ID, Date, X1-X7).

 

 

DATA infect.weights;
 SET infect.weights;
 ATTRIB ID LENGTH=8 FORMAT=8. LABEL="ID";
 ATTRIB DATE LENGTH=8 FORMAT=MMDDYY10. LABEL="Date";
RUN;

DATA infect.temperature;
 SET infect.temperature;
 ATTRIB ID LENGTH=8 FORMAT=8. LABEL="ID";
 ATTRIB DATE LENGTH=8 FORMAT=MMDDYY10. LABEL="Date";
RUN;

DATA infect.cell;
 SET infect.qscout;
 
 ATTRIB ID LENGTH=8 FORMAT=8. LABEL="ID";
 ATTRIB DATE LENGTH=8 FORMAT=MMDDYY10. LABEL="Date";

RUN;

DATA parameters;
	MERGE infect.weights infect.temperature;
	BY ID Date;
RUN;

DATA parameters2;
	MERGE parameters infect.counts;
	BY ID Date;
RUN;

/* I have also tried to perform the merge within one data step and get the same results */ 

 

 

My Results end up looking like this:

 

ID    Group   Date                 Weight   Temp   X1    X2      X3    X4    X5    X6    X7
1          1      01/01/2001         100    100     .      .     .     .     .     .      .
1          1      01/01/2001            .     .     1      2     3     4     5     6      7 
2          1      01/01/2001         125    101     .      .      .    .     .     .      .
2          1      01/01/2001            .     .     8      9     10    11    12    13    14
 

Thanks in advance!

 


HHutch
Calcite | Level 5

The log shows 


327 DATA parameters2;
328 MERGE parameters infect.qscout;
329 BY ID Date;
330 RUN;

NOTE: There were 375 observations read from the data set WORK.PARAMETERS.
NOTE: There were 233 observations read from the data set INFECT.QSCOUT.
NOTE: The data set WORK.PARAMETERS2 has 608 observations and 12 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

ballardw
Super User

The result you show indicates that the internal stored value of either ID or Date (or possibly both) varies somehow. If the variables are character look for blank characters such as a null at the end or beginning of the value.

If numeric, check that there are not decimals in values that aren't displayed. Note that date values with decimals and assigned one of the SAS date formats will happily ignore the decimal portion for display but use the entire value for comparisons such as merge or logic tests.

 

See:

data example;
   date= '01Jan2018'd + 0.01;
   format date mmddyy10.;
run;
data example2;
   date= '01Jan2018'd ;
   format date mmddyy10.;
run;

data merged;
   merge example example2;
   by date;
run;

I might try proc freq to display the ID or Date value using a BEST12. or similar to see if decimals are in your data. If so you might try rounding the values in a data step prior to the merge. 

 

HHutch
Calcite | Level 5

You were right! The one file's date contains values after the decimal point. I didn't realize the dataset it was being pulled from used a date/time format.

 

Thank you for your help! It is greatly appreciated!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 663 views
  • 1 like
  • 3 in conversation