Solved: MERGE statement has more than one data set with repeats of BY values.

User_2024

I have two datasets that I need both to be merged in my result. DataSet A has roughly 40K rows and DataSet B has 139K rows. Both has two common columns acct_num and acct_id. When I sort the datasets with acct_num and merge them with acct_num, I get the mergedataset with 139K rows skipping ALL of the data from dataset A . Ideally my merge dataset should have 144343 observations and 35 variables.

I tried reading the other post with same query but I am not able to implement the solution.

I also tried sorting on the acct_id but the mergedataset is not helping. It would be really a learning if I can get to know the ways to resolve it.

Appreciate your time and suggestions. Pasted the warnings from Sas. My code is error free.

NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables.
NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 39585 observations read from the data set MYORA.SETA.
NOTE: There were 138977 observations read from the data set MYORA.SETB.
NOTE: The data set MYORA.MERGEDATA has 138977 observations and 35 variables.
NOTE: DATA statement used (Total process time):
      real time           7.22 seconds
      cpu time            3.96 seconds


NOTE: Remote submit to ACXIOM complete.

Tom

You need to explain more what you are trying to do to get better advice. What do the two datasets represent?

Why would you expect to get more observations out than the number of observations in the larger dataset? Do you expect some of the BY values to appear only the smaller dataset?

If they have two common variables why are you merging only one one of them? The information stored in that second common variable will be lost. Perhaps that also explains why you have multiple observations for the same value of the BY variable in both datasets.

It is very rare that would actually ever want to MERGE two dataset where both datasets contribute multiple observations in the same BY group. When you do that SAS will match the observations in the order they appear within the BY group until one of the datasets runs out. At which point the values for the variables only contributed by that smaller dataset from the last observation in the BY group from that dataset are carried forward onto all of the extra observations from the other dataset (since there are no more observations being read in to replace the values of those variables).

If you just want to stack (or interleave) the observations then use a SET statement, not a MERGE statement. That will result in the same number of observations out as in. But that is only normally useful with the dataset have all the same variables.

View solution in original post

Tom

You need to explain more what you are trying to do to get better advice. What do the two datasets represent?

Why would you expect to get more observations out than the number of observations in the larger dataset? Do you expect some of the BY values to appear only the smaller dataset?

If they have two common variables why are you merging only one one of them? The information stored in that second common variable will be lost. Perhaps that also explains why you have multiple observations for the same value of the BY variable in both datasets.

It is very rare that would actually ever want to MERGE two dataset where both datasets contribute multiple observations in the same BY group. When you do that SAS will match the observations in the order they appear within the BY group until one of the datasets runs out. At which point the values for the variables only contributed by that smaller dataset from the last observation in the BY group from that dataset are carried forward onto all of the extra observations from the other dataset (since there are no more observations being read in to replace the values of those variables).

If you just want to stack (or interleave) the observations then use a SET statement, not a MERGE statement. That will result in the same number of observations out as in. But that is only normally useful with the dataset have all the same variables.

User_2024

The SET statement helped to achieve the desired outcome. It has some layovers that I am trying to fix now. Appreciate your support.

Kurt_Bremser

Providing usable (DATA steps with DATALINES) example data gives us a clear picture of structure and content, so please DO THAT.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

MERGE statement has more than one data set with repeats of BY values.

Re: MERGE statement has more than one data set with repeats of BY values.

Re: MERGE statement has more than one data set with repeats of BY values.

Re: MERGE statement has more than one data set with repeats of BY values.

Re: MERGE statement has more than one data set with repeats of BY values.

MERGE statement has more than one data set with repeats of BY values.

Re: MERGE statement has more than one data set with repeats of BY values.

Re: MERGE statement has more than one data set with repeats of BY values.

Re: MERGE statement has more than one data set with repeats of BY values.

Re: MERGE statement has more than one data set with repeats of BY values.

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!