Hi,
I'm trying to inner join two datasets with 4 field in each. 3 of the fields are the same in both tables;
data set Nov: ACCTNB, ID, SourceACCT, Amount_Nov
data set Dec: ACCTNB, ID, SourceACCT, Amount_Dec
I'm trying o inner join these data sets to compare the amounts and see if customers are paying different amount in two months. The problem is if there is one transaction for ID1 in Dec and three transactions for the same ID in Nov, SAS duplicates the Dec data three times to match Nov data.
how can I solve this problem?
Thank you
Name the amount variable same in all datasets.
Use a data step to concatenate all datasets, and use the indsname option, so that the sources are identified in the final dataset.
Now you can use by-group processing to summarize for each month, each account, etc
This means you have duplicates that you need to remove OR you need to add a condition to your join or WHERE to filter out the records.
Determine why some have duplicates first, or how you would identify which records to keep and then we can help you out.
@parmis wrote:
Hi,
I'm trying to inner join two datasets with 4 field in each. 3 of the fields are the same in both tables;
data set Nov: ACCTNB, ID, SourceACCT, Amount_Nov
data set Dec: ACCTNB, ID, SourceACCT, Amount_Dec
I'm trying o inner join these data sets to compare the amounts and see if customers are paying different amount in two months. The problem is if there is one transaction for ID1 in Dec and three transactions for the same ID in Nov, SAS duplicates the Dec data three times to match Nov data.
how can I solve this problem?
Thank you
Name the amount variable same in all datasets.
Use a data step to concatenate all datasets, and use the indsname option, so that the sources are identified in the final dataset.
Now you can use by-group processing to summarize for each month, each account, etc
Could you explain more about "using the indsname option"?
I changed the Amount variable to be the same in both data sets, and I used data step to concatenate the data. now I have one data set with four variables.
Now, what would be the next step?
Thanks
data class1; set sashelp.class; run;
data class2; set sashelp.class; run;
data class3; set sashelp.class; run;
data combined;
set class1-class3 indsname=source;
dsn_input = source;
run;
proc means data=combined n mean std max min median;
class dsn_input;
var weight;
run;
@parmis wrote:
Could you explain more about "using the indsname option"?
I changed the Amount variable to be the same in both data sets, and I used data step to concatenate the data. now I have one data set with four variables.
Now, what would be the next step?
Thanks
@parmis wrote:
Could you explain more about "using the indsname option"?
I changed the Amount variable to be the same in both data sets, and I used data step to concatenate the data. now I have one data set with four variables.
Now, what would be the next step?
Thanks
I can see that, while I was resting in Morpheus' arms, @Reeza gave you a nice example. Use it as blueprint for your datasets.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.