Solved: selecting good matches by Proc SQL

niam · Posted 09-30-2013 12:40 AM

I have a data set of 434 observation (named as selection) and another data-set with about 17000 observations (named as nodup). The first data-set is my main analysis group (study group) and I want to select one control match from the second data-set for each of the observations in the study group. The matching variables are SIC, FYEAR, SALE. the first two should be exact matches, while SALE should be between a range, defined by lowsale and highsale variables.

If I run this PROC SQL, I will get multiple matches for some of the observation in my study group, while for some others, there is only one match.

proc sql;

create table controls_id as

select * from selection, nodup

where ((nodup.xSALE between selection.lowsale and selection.highsale) and

and selection.SIC=nodup.xsic

and selection.fyear=nodup.xfyear );

quit;

When I have multiple matches, I want to select the closest match for my observation. the closeness is defined as ((sale-xsale)^2). In other words, I want to select the match which has the smallest difference in the sale variable.

Would you please let me know how I can perform this step?

Thanks for reading this post

LinusH · Posted 09-30-2013 02:24 AM

Create the closiness column, and make use of GROUP BY and HAVING (max(...)) clauses.

Data never sleeps

View solution in original post

LinusH · Posted 09-30-2013 02:24 AM

Create the closiness column, and make use of GROUP BY and HAVING (max(...)) clauses.

Data never sleeps

selecting good matches by Proc SQL

Re: selecting good matches by Proc SQL

Re: selecting good matches by Proc SQL

selecting good matches by Proc SQL

Re: selecting good matches by Proc SQL

Re: selecting good matches by Proc SQL

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away