Desktop productivity for business analysts and programmers

Why won't my data sets merge? (I get a data set with all my data, but still separated)

Reply
Contributor
Posts: 50

Why won't my data sets merge? (I get a data set with all my data, but still separated)

So I'm trying to merge two data sets based on the variable provider_number.  When I run my merge statement, I get output but with a unique observation for each data set (each unique provider_number appears twice, once showing data for my data set A variables and once showing data for my data set B variables).

Data Set A = WORK.MASTER1

  • 17 variables
  • 4826 observations

Data Set B = WORK.HQI_HOSP_SPP2

  • 4 variables
  • 3349 observations (not all hospitals are present in this data set)

Merged Data Set = WORK.MASTER2

  • 20 variables
  • 8175 observations (= 4826 + 3349)

Just to note, I went back through my coding to ensure that provider_number is coded with the same format and length ($10.).

Let me know if you'd like to see more detailed coding for the two data sets I'm trying to merge.  I've looked through them and everything looks good.

CODING: 

PROC SORT DATA=WORK.MASTER1;

  BY Provider_Number;

RUN;

PROC SORT DATA=WORK.HQI_HOSP_SPP2;

  BY Provider_Number;

RUN;

DATA WORK.MASTER2 ;

  MERGE WORK.MASTER1 (in=A) WORK.HQI_HOSP_SPP2 (in=b);

  BY Provider_Number;

RUN;

PROC PRINT DATA=WORK.MASTER2;

TITLE1 MASTER FILE;

RUN;

PROC CONTENTS DATA=WORK.MASTER2;

TITLE1 MASTER FILE;

RUN;

LOG:

1899  PROC SORT DATA=WORK.MASTER1;

1900      BY Provider_Number;

1901  RUN;

NOTE: Input data set is already sorted, no sorting done.

NOTE: PROCEDURE SORT used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

1902

1903  PROC SORT DATA=WORK.HQI_HOSP_SPP2;

1904      BY Provider_Number;

1905  RUN;

NOTE: Input data set is already sorted, no sorting done.

NOTE: PROCEDURE SORT used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

1906

1907  DATA WORK.MASTER2 ;

1908    MERGE WORK.MASTER1 (in=A) WORK.HQI_HOSP_SPP2 (in=b);

1909    BY Provider_Number;

1910  RUN;

NOTE: There were 4826 observations read from the data set WORK.MASTER1.

NOTE: There were 3349 observations read from the data set WORK.HQI_HOSP_SPP2.

NOTE: The data set WORK.MASTER2 has 8175 observations and 20 variables.

NOTE: DATA statement used (Total process time):

      real time           0.01 seconds

      cpu time            0.01 seconds

1911

1912  PROC PRINT DATA=WORK.MASTER2;

NOTE: Writing HTML Body file: sashtml2.htm

1913  TITLE1 MASTER FILE;

1914  RUN;

NOTE: There were 8175 observations read from the data set WORK.MASTER2.

NOTE: PROCEDURE PRINT used (Total process time):

      real time           8.26 seconds

      cpu time            7.37 seconds

1915

1916  PROC CONTENTS DATA=WORK.MASTER2;

1917  TITLE1 MASTER FILE;

1918  RUN;

NOTE: PROCEDURE CONTENTS used (Total process time):

      real time           0.09 seconds

      cpu time            0.03 seconds

PROC Star
Posts: 7,433

Re: Why won't my data sets merge? (I get a data set with all my data, but still separated)

I will guess that there is a difference between provider_number between the two files.  Is provider_number a character variable?

If so, I would look carefully at the first 10 or so records from each of the input files and try to identify what the difference is.  It could be leading spaces, trailing spaces, leading zeros.  No way for us to tell without seeing actual data.

Trusted Advisor
Posts: 2,114

Re: Why won't my data sets merge? (I get a data set with all my data, but still separated)

Your code looks fine.  My guess is that it is something subtle in the provider_number.  You mention that it is a text field.  If you have leading spaces in one, but no the other, that will do it.  Also the case of any letters must be the same to make a match.  Another thing to check for is non-printable control characters in the field.

If the provider_number is actually a number, try converting to a numeric field and then doing the merge; that would address alignment issues as well as potential control characters (they typically cause an error message on conversion).

Doc Muhlbaier

Duke

Respected Advisor
Posts: 4,814

Re: Why won't my data sets merge? (I get a data set with all my data, but still separated)

Could be a simple alignment problem. Try to right align provider_number in both datasets before sorting.

data A; set A; provider_number=right(provider_number); run;

data B; set B; provider_number=right(provider_number); run;

PG

PG
Contributor
Posts: 50

Re: Why won't my data sets merge? (I get a data set with all my data, but still separated)

Hey guys,

You were all right.  I went in and expanded all of my PROC IMPORT statements then ran PROC CONTENTS to really see the details of what was going on.  I was losing the format with a transformation I did.

Thanks for the help!

Ask a Question
Discussion stats
  • 4 replies
  • 1194 views
  • 0 likes
  • 4 in conversation