BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
User_2024
Obsidian | Level 7

I have two datasets that I need both to be merged in my result.  DataSet A has roughly 40K rows and DataSet B has 139K rows. Both has two common columns acct_num and acct_id. When I sort the datasets with acct_num and merge them with acct_num, I get the mergedataset with 139K rows skipping ALL of the data from dataset A . Ideally my merge dataset should have 144343 observations and 35 variables. 

I tried reading the other post with same query but I am not able to implement the solution.  

I also tried sorting on the acct_id but the mergedataset is not helping. It would be really a learning if I can get to know the ways to resolve it. 

Appreciate your time and suggestions. Pasted the warnings from Sas.  My code is error free. 

 

 

 

 

 

NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables.
NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 39585 observations read from the data set MYORA.SETA.
NOTE: There were 138977 observations read from the data set MYORA.SETB.
NOTE: The data set MYORA.MERGEDATA has 138977 observations and 35 variables.
NOTE: DATA statement used (Total process time):
      real time           7.22 seconds
      cpu time            3.96 seconds


NOTE: Remote submit to ACXIOM complete.
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You need to explain more what you are trying to do to get better advice.  What do the two datasets represent?  

 

Why would you expect to get more observations out than the number of observations in the larger dataset?  Do you expect some of the BY values to appear only the smaller dataset?

 

If they have two common variables why are you merging only one one of them?  The information stored in that second common variable will be lost.  Perhaps that also explains why you have multiple observations for the same value of the BY variable in both datasets.

 

It is very rare that would actually ever want to MERGE two dataset where both datasets contribute multiple observations in the same BY group.  When you do that SAS will match the observations in the order they appear within the BY group until one of the datasets runs out.  At which point the values for the variables only contributed by that smaller dataset from the last observation in the BY group from that dataset are carried forward onto all of the extra observations from the other dataset (since there are no more observations being read in to replace the values of those variables).

 

If you just want to stack (or interleave) the observations then use a SET statement, not a MERGE statement.  That will result in the same number of observations out as in.  But that is only normally useful with the dataset have all the same variables.

View solution in original post

3 REPLIES 3
Tom
Super User Tom
Super User

You need to explain more what you are trying to do to get better advice.  What do the two datasets represent?  

 

Why would you expect to get more observations out than the number of observations in the larger dataset?  Do you expect some of the BY values to appear only the smaller dataset?

 

If they have two common variables why are you merging only one one of them?  The information stored in that second common variable will be lost.  Perhaps that also explains why you have multiple observations for the same value of the BY variable in both datasets.

 

It is very rare that would actually ever want to MERGE two dataset where both datasets contribute multiple observations in the same BY group.  When you do that SAS will match the observations in the order they appear within the BY group until one of the datasets runs out.  At which point the values for the variables only contributed by that smaller dataset from the last observation in the BY group from that dataset are carried forward onto all of the extra observations from the other dataset (since there are no more observations being read in to replace the values of those variables).

 

If you just want to stack (or interleave) the observations then use a SET statement, not a MERGE statement.  That will result in the same number of observations out as in.  But that is only normally useful with the dataset have all the same variables.

User_2024
Obsidian | Level 7

The SET statement helped to achieve the desired outcome.  It has some layovers that I am trying to fix now. Appreciate your support. 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 712 views
  • 1 like
  • 3 in conversation