So I'm trying to merge two data sets based on the variable provider_number. When I run my merge statement, I get output but with a unique observation for each data set (each unique provider_number appears twice, once showing data for my data set A variables and once showing data for my data set B variables).
Data Set A = WORK.MASTER1
Data Set B = WORK.HQI_HOSP_SPP2
Merged Data Set = WORK.MASTER2
Just to note, I went back through my coding to ensure that provider_number is coded with the same format and length ($10.).
Let me know if you'd like to see more detailed coding for the two data sets I'm trying to merge. I've looked through them and everything looks good.
CODING:
PROC SORT DATA=WORK.MASTER1;
BY Provider_Number;
RUN;
PROC SORT DATA=WORK.HQI_HOSP_SPP2;
BY Provider_Number;
RUN;
DATA WORK.MASTER2 ;
MERGE WORK.MASTER1 (in=A) WORK.HQI_HOSP_SPP2 (in=b);
BY Provider_Number;
RUN;
PROC PRINT DATA=WORK.MASTER2;
TITLE1 MASTER FILE;
RUN;
PROC CONTENTS DATA=WORK.MASTER2;
TITLE1 MASTER FILE;
RUN;
LOG:
1899 PROC SORT DATA=WORK.MASTER1;
1900 BY Provider_Number;
1901 RUN;
NOTE: Input data set is already sorted, no sorting done.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
1902
1903 PROC SORT DATA=WORK.HQI_HOSP_SPP2;
1904 BY Provider_Number;
1905 RUN;
NOTE: Input data set is already sorted, no sorting done.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
1906
1907 DATA WORK.MASTER2 ;
1908 MERGE WORK.MASTER1 (in=A) WORK.HQI_HOSP_SPP2 (in=b);
1909 BY Provider_Number;
1910 RUN;
NOTE: There were 4826 observations read from the data set WORK.MASTER1.
NOTE: There were 3349 observations read from the data set WORK.HQI_HOSP_SPP2.
NOTE: The data set WORK.MASTER2 has 8175 observations and 20 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
1911
1912 PROC PRINT DATA=WORK.MASTER2;
NOTE: Writing HTML Body file: sashtml2.htm
1913 TITLE1 MASTER FILE;
1914 RUN;
NOTE: There were 8175 observations read from the data set WORK.MASTER2.
NOTE: PROCEDURE PRINT used (Total process time):
real time 8.26 seconds
cpu time 7.37 seconds
1915
1916 PROC CONTENTS DATA=WORK.MASTER2;
1917 TITLE1 MASTER FILE;
1918 RUN;
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.09 seconds
cpu time 0.03 seconds
I will guess that there is a difference between provider_number between the two files. Is provider_number a character variable?
If so, I would look carefully at the first 10 or so records from each of the input files and try to identify what the difference is. It could be leading spaces, trailing spaces, leading zeros. No way for us to tell without seeing actual data.
I will guess that there is a difference between provider_number between the two files. Is provider_number a character variable?
If so, I would look carefully at the first 10 or so records from each of the input files and try to identify what the difference is. It could be leading spaces, trailing spaces, leading zeros. No way for us to tell without seeing actual data.
Your code looks fine. My guess is that it is something subtle in the provider_number. You mention that it is a text field. If you have leading spaces in one, but no the other, that will do it. Also the case of any letters must be the same to make a match. Another thing to check for is non-printable control characters in the field.
If the provider_number is actually a number, try converting to a numeric field and then doing the merge; that would address alignment issues as well as potential control characters (they typically cause an error message on conversion).
Doc Muhlbaier
Duke
Could be a simple alignment problem. Try to right align provider_number in both datasets before sorting.
data A; set A; provider_number=right(provider_number); run;
data B; set B; provider_number=right(provider_number); run;
PG
Hey guys,
You were all right. I went in and expanded all of my PROC IMPORT statements then ran PROC CONTENTS to really see the details of what was going on. I was losing the format with a transformation I did.
Thanks for the help!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.