Hello,
I have two datasets that look like the following:
Dataset 1:
sbdv | num | loc |
150 | 0.876 | 1 |
160 | 0.993 | 1 |
171 | 0.937 | 1 |
Dataset 2:
sbdv | num | loc |
120 | 0.987 | 0 |
121 | 0.965 | 0 |
123 | 0.945 | 0 |
125 | 0.912 | 0 |
I want to create a new dataset that contains all of dataset 1 plus a match of observations in dataset 2 of the observations with the minimum difference in 'num' between each obs in dataset 1 and all the obs in dataset 2.
So for sbdv = 150, I want to calculate the difference between its 'num' (0.876) and sbdv 120, 121, 123, and 125's respective 'num' values (0.987, 0.965, 0.945, 0.912). Then I want to save each 'match' to a new dataset containing all of dataset 1 and the matches for each observation.
So for this example, the new dataset would be:
sbdv | num | loc |
150 | 0.876 | 1 |
160 | 0.993 | 1 |
171 | 0.937 | 1 |
125 | 0.912 | 0 |
120 | 0.987 | 0 |
123 | 0.945 | 0 |
I am guessing there is a straight forward solution, but I haven't been able to find anything that explicitly will do this.
Assuming the pairing has to be one-to-one, this can be expressed as a linear assignment problem and solved easily with PROC OPTNET (a proc of SAS/OR):
data ds1;
input sbdv num loc;
datalines;
150 0.876 1
160 0.993 1
171 0.937 1
;
data ds2;
input sbdv num loc;
datalines;
120 0.987 0
121 0.965 0
123 0.945 0
125 0.912 0
;
proc sql;
create table pairs as
select
ds1.sbdv as from,
ds2.sbdv as to,
abs(ds1.num-ds2.num) as weight
from ds1, ds2;
quit;
proc optnet data_links=pairs direction=directed;
linear_assignment out=selected_pairs;
run;
proc sql;
create table want as
select sbdv, num, loc
from
(select sbdv, num, loc from selected_pairs, ds1 where from=sbdv)
union all
(select sbdv, num, loc from selected_pairs, ds2 where to=sbdv);
quit;
You don't need to know which observation from ds 2 match each observation from ds 1?
No, just match ds 1 obs to ds 2 based on ds1 num - ds2 num
When you say "minimum difference" do you mean absolute value of difference?
Assuming the pairing has to be one-to-one, this can be expressed as a linear assignment problem and solved easily with PROC OPTNET (a proc of SAS/OR):
data ds1;
input sbdv num loc;
datalines;
150 0.876 1
160 0.993 1
171 0.937 1
;
data ds2;
input sbdv num loc;
datalines;
120 0.987 0
121 0.965 0
123 0.945 0
125 0.912 0
;
proc sql;
create table pairs as
select
ds1.sbdv as from,
ds2.sbdv as to,
abs(ds1.num-ds2.num) as weight
from ds1, ds2;
quit;
proc optnet data_links=pairs direction=directed;
linear_assignment out=selected_pairs;
run;
proc sql;
create table want as
select sbdv, num, loc
from
(select sbdv, num, loc from selected_pairs, ds1 where from=sbdv)
union all
(select sbdv, num, loc from selected_pairs, ds2 where to=sbdv);
quit;
data ds1; input sbdv num loc; datalines; 150 0.876 1 160 0.993 1 171 0.937 1 ; data ds2; input _sbdv _num _loc; datalines; 120 0.987 0 121 0.965 0 123 0.945 0 125 0.912 0 ; data want; if _n_=1 then do; if 0 then set ds2; declare hash h(dataset:'ds2'); declare hiter hi('h'); h.definekey('_sbdv'); h.definedata(all:'y'); h.definedone(); end; set ds1; min=999999; do while(hi.next()=0); dis=abs(num-_num); if dis<min then do;min=dis;sbdv1=_sbdv;num1=_num;loc1=_loc;end; end; h.remove(key:sbdv1); drop _: min dis; run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.