BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
User_2024
Obsidian | Level 7

I have two datasets that I need both to be merged in my result.  DataSet A has roughly 40K rows and DataSet B has 139K rows. Both has two common columns acct_num and acct_id. When I sort the datasets with acct_num and merge them with acct_num, I get the mergedataset with 139K rows skipping ALL of the data from dataset A . Ideally my merge dataset should have 144343 observations and 35 variables. 

I tried reading the other post with same query but I am not able to implement the solution.  

I also tried sorting on the acct_id but the mergedataset is not helping. It would be really a learning if I can get to know the ways to resolve it. 

Appreciate your time and suggestions. Pasted the warnings from Sas.  My code is error free. 

 

 

 

 

 

NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables.
NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 39585 observations read from the data set MYORA.SETA.
NOTE: There were 138977 observations read from the data set MYORA.SETB.
NOTE: The data set MYORA.MERGEDATA has 138977 observations and 35 variables.
NOTE: DATA statement used (Total process time):
      real time           7.22 seconds
      cpu time            3.96 seconds


NOTE: Remote submit to ACXIOM complete.
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You need to explain more what you are trying to do to get better advice.  What do the two datasets represent?  

 

Why would you expect to get more observations out than the number of observations in the larger dataset?  Do you expect some of the BY values to appear only the smaller dataset?

 

If they have two common variables why are you merging only one one of them?  The information stored in that second common variable will be lost.  Perhaps that also explains why you have multiple observations for the same value of the BY variable in both datasets.

 

It is very rare that would actually ever want to MERGE two dataset where both datasets contribute multiple observations in the same BY group.  When you do that SAS will match the observations in the order they appear within the BY group until one of the datasets runs out.  At which point the values for the variables only contributed by that smaller dataset from the last observation in the BY group from that dataset are carried forward onto all of the extra observations from the other dataset (since there are no more observations being read in to replace the values of those variables).

 

If you just want to stack (or interleave) the observations then use a SET statement, not a MERGE statement.  That will result in the same number of observations out as in.  But that is only normally useful with the dataset have all the same variables.

View solution in original post

3 REPLIES 3
Tom
Super User Tom
Super User

You need to explain more what you are trying to do to get better advice.  What do the two datasets represent?  

 

Why would you expect to get more observations out than the number of observations in the larger dataset?  Do you expect some of the BY values to appear only the smaller dataset?

 

If they have two common variables why are you merging only one one of them?  The information stored in that second common variable will be lost.  Perhaps that also explains why you have multiple observations for the same value of the BY variable in both datasets.

 

It is very rare that would actually ever want to MERGE two dataset where both datasets contribute multiple observations in the same BY group.  When you do that SAS will match the observations in the order they appear within the BY group until one of the datasets runs out.  At which point the values for the variables only contributed by that smaller dataset from the last observation in the BY group from that dataset are carried forward onto all of the extra observations from the other dataset (since there are no more observations being read in to replace the values of those variables).

 

If you just want to stack (or interleave) the observations then use a SET statement, not a MERGE statement.  That will result in the same number of observations out as in.  But that is only normally useful with the dataset have all the same variables.

User_2024
Obsidian | Level 7

The SET statement helped to achieve the desired outcome.  It has some layovers that I am trying to fix now. Appreciate your support. 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 467 views
  • 1 like
  • 3 in conversation