BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
parmis
Fluorite | Level 6

Hi,

 

I'm trying to inner join two datasets with 4 field in each. 3 of the fields are the same in both tables;

data set Nov: ACCTNB, ID, SourceACCT, Amount_Nov

data set Dec: ACCTNB, ID, SourceACCT, Amount_Dec

I'm trying o inner join these data sets to compare the amounts and see if  customers are paying different amount in two months. The problem is if there is one transaction for ID1 in Dec and three transactions for the same ID in Nov, SAS duplicates the Dec data three times to match Nov data.

how can I solve this problem?

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Name the amount variable same in all datasets.

Use a data step to concatenate all datasets, and use the indsname option, so that the sources are identified in the final dataset.

Now you can use by-group processing to summarize for each month, each account, etc

View solution in original post

5 REPLIES 5
Reeza
Super User

This means you have duplicates that you need to remove OR you need to add a condition to your join or WHERE to filter out the records. 

Determine why some have duplicates first, or how you would identify which records to keep and then we can help you out. 

 


@parmis wrote:

Hi,

 

I'm trying to inner join two datasets with 4 field in each. 3 of the fields are the same in both tables;

data set Nov: ACCTNB, ID, SourceACCT, Amount_Nov

data set Dec: ACCTNB, ID, SourceACCT, Amount_Dec

I'm trying o inner join these data sets to compare the amounts and see if  customers are paying different amount in two months. The problem is if there is one transaction for ID1 in Dec and three transactions for the same ID in Nov, SAS duplicates the Dec data three times to match Nov data.

how can I solve this problem?

 

Thank you


 

Kurt_Bremser
Super User

Name the amount variable same in all datasets.

Use a data step to concatenate all datasets, and use the indsname option, so that the sources are identified in the final dataset.

Now you can use by-group processing to summarize for each month, each account, etc

parmis
Fluorite | Level 6

Could you explain more about "using the indsname option"?

I changed the Amount variable to be the same in both data sets, and I used data step to concatenate the data. now I have one data set with four variables.

Now, what would be the next step?

 

Thanks

Reeza
Super User
data class1; set sashelp.class; run;
data class2; set sashelp.class; run;

data class3; set sashelp.class; run;

data combined;
set class1-class3 indsname=source;
dsn_input = source;
run;

proc means data=combined n mean std max min median;
class dsn_input;
var weight;
run;


@parmis wrote:

Could you explain more about "using the indsname option"?

I changed the Amount variable to be the same in both data sets, and I used data step to concatenate the data. now I have one data set with four variables.

Now, what would be the next step?

 

Thanks




Kurt_Bremser
Super User

@parmis wrote:

Could you explain more about "using the indsname option"?

I changed the Amount variable to be the same in both data sets, and I used data step to concatenate the data. now I have one data set with four variables.

Now, what would be the next step?

 

Thanks


I can see that, while I was resting in Morpheus' arms, @Reeza gave you a nice example. Use it as blueprint for your datasets.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1055 views
  • 0 likes
  • 3 in conversation