I am trying to make a process run faster, that uses 3 years worth of hospital data. My goal is to just run the past 3 months of data, and append it to the existing large dataset. If a patient identifier appears in both tables, I want to only use the data from the new table. For example: data have_a; input ID amount; datalines; 1 10 3 15 4 20 7 10 7 15 9 12 10 14 ; run; data have_b; input ID amount; datalines; 2 15 3 20 4 10 4 15 5 12 7 20 8 15 9 10 11 20 ; run; data test; merge have_a have_b; by ID; run; My goal: If have_a and have_b contain any ID# in common, it should only keep the values from have_b. However, as it stands, ID#7 has 2 rows in the final dataset (with amounts 20 and 15) when it should only have 1. This is a very simplified version of my real data, which contains 20+ columns and millions of rows.
... View more