Hi,
I'm trying to inner join two datasets with 4 field in each. 3 of the fields are the same in both tables;
data set Nov: ACCTNB, ID, SourceACCT, Amount_Nov
data set Dec: ACCTNB, ID, SourceACCT, Amount_Dec
I'm trying o inner join these data sets to compare the amounts and see if customers are paying different amount in two months. The problem is if there is one transaction for ID1 in Dec and three transactions for the same ID in Nov, SAS duplicates the Dec data three times to match Nov data.
how can I solve this problem?
Thank you
Name the amount variable same in all datasets.
Use a data step to concatenate all datasets, and use the indsname option, so that the sources are identified in the final dataset.
Now you can use by-group processing to summarize for each month, each account, etc
This means you have duplicates that you need to remove OR you need to add a condition to your join or WHERE to filter out the records.
Determine why some have duplicates first, or how you would identify which records to keep and then we can help you out.
@parmis wrote:
Hi,
I'm trying to inner join two datasets with 4 field in each. 3 of the fields are the same in both tables;
data set Nov: ACCTNB, ID, SourceACCT, Amount_Nov
data set Dec: ACCTNB, ID, SourceACCT, Amount_Dec
I'm trying o inner join these data sets to compare the amounts and see if customers are paying different amount in two months. The problem is if there is one transaction for ID1 in Dec and three transactions for the same ID in Nov, SAS duplicates the Dec data three times to match Nov data.
how can I solve this problem?
Thank you
Name the amount variable same in all datasets.
Use a data step to concatenate all datasets, and use the indsname option, so that the sources are identified in the final dataset.
Now you can use by-group processing to summarize for each month, each account, etc
Could you explain more about "using the indsname option"?
I changed the Amount variable to be the same in both data sets, and I used data step to concatenate the data. now I have one data set with four variables.
Now, what would be the next step?
Thanks
data class1; set sashelp.class; run;
data class2; set sashelp.class; run;
data class3; set sashelp.class; run;
data combined;
set class1-class3 indsname=source;
dsn_input = source;
run;
proc means data=combined n mean std max min median;
class dsn_input;
var weight;
run;
@parmis wrote:
Could you explain more about "using the indsname option"?
I changed the Amount variable to be the same in both data sets, and I used data step to concatenate the data. now I have one data set with four variables.
Now, what would be the next step?
Thanks
@parmis wrote:
Could you explain more about "using the indsname option"?
I changed the Amount variable to be the same in both data sets, and I used data step to concatenate the data. now I have one data set with four variables.
Now, what would be the next step?
Thanks
I can see that, while I was resting in Morpheus' arms, @Reeza gave you a nice example. Use it as blueprint for your datasets.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.