I have a data set of 434 observation (named as selection) and another data-set with about 17000 observations (named as nodup). The first data-set is my main analysis group (study group) and I want to select one control match from the second data-set for each of the observations in the study group. The matching variables are SIC, FYEAR, SALE. the first two should be exact matches, while SALE should be between a range, defined by lowsale and highsale variables.
If I run this PROC SQL, I will get multiple matches for some of the observation in my study group, while for some others, there is only one match.
proc sql;
create table controls_id as
select * from selection, nodup
where ((nodup.xSALE between selection.lowsale and selection.highsale) and
and selection.SIC=nodup.xsic
and selection.fyear=nodup.xfyear );
quit;
When I have multiple matches, I want to select the closest match for my observation. the closeness is defined as ((sale-xsale)^2). In other words, I want to select the match which has the smallest difference in the sale variable.
Would you please let me know how I can perform this step?
Thanks for reading this post
Create the closiness column, and make use of GROUP BY and HAVING (max(...)) clauses.
Create the closiness column, and make use of GROUP BY and HAVING (max(...)) clauses.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.