Hello,
I have three large data sets that I merge into one data set data set - one has 20121 rows and data set 2 and 3 have 20143 rows I sort them all on the Emp identifier then run the merge.
After I merge the three I end up with 20143 rows but I end up with 22 members with a duplicate Contribution amount gets added twice with the merge.
To resolve it I take my finished report exported into Excel and find duplicate SSN and high light them
Then I filter the 20143 rows on the color high light to bring them to the top of the list
Then I manually update the Emp Contribtion amounts to zero for one of the duplicate values - this is where NODUPRECS NODUPKEY dropped the entire dup rows and the other variables amounts in the rows are needed or it tosses off my other column amounts.
I am looking for a modify or update data step or SQL or case that I can clean up those rows with a dup member identifier that also have a dup contribution amount to update one of them to zero.
Any help is greatly appreciated.
SR-
data abc_Merged_001AAa; MERGE abc_STEP2_SORTEDA abc_STEP3_SORTED_A abc_STEP1_SORTEDa ; by ami ; RUN;
NOTE: MERGE statement has more than one data set with repeats of BY values. NOTE: There were 20143 observations read from the data set WORK.abc_STEP2_SORTEDA. NOTE: There were 20121 observations read from the data set WORK.abc_STEP3_SORTED_D. NOTE: There were 20143 observations read from the data set WORK.abc_STEP1_SORTEDA. NOTE: The data set WORK.abc_MERGED_001AAA has 20143 observations and 21 variables.
... View more