Ok, so I just tried to replicate this code with an updated dataset, but for some reason the results aren't right (as in, the final number of observations is way more than it should be). So, this time, I'm merging HMDA data (1,906,993 observations) with AHS data (22,511 observations). This is the code I am using: Proc sql; create table AHS_HMDA as select L.* R.*, min(hmdaincome-ahsincome) as min_income from ahs as L left join hmda as R on L.smsa=R.smsa and L.race=R.race and L.sex=R.sex and L.ahsincome <= R.hmdaincome group by L.smsa, L.race, L.sex having (hmdaincome-ahsincome) = calculated min_income; quit; This process is now resulting in over 100,000 observations, when there should not be any more observations than the AHS dataset (22,511). I'm basically trying to find matches between the two datasets, and only keeping the minimized difference between the income values of both datasets.... Last time I ran the code we had discussed before, I was left with a lot less observations. Not sure why it's different.
... View more