Hello,
I have two datasets that look like the following:
Dataset 1:
sbdv | num | loc |
150 | 0.876 | 1 |
160 | 0.993 | 1 |
171 | 0.937 | 1 |
Dataset 2:
sbdv | num | loc |
120 | 0.987 | 0 |
121 | 0.965 | 0 |
123 | 0.945 | 0 |
125 | 0.912 | 0 |
I want to create a new dataset that contains all of dataset 1 plus a match of observations in dataset 2 of the observations with the minimum difference in 'num' between each obs in dataset 1 and all the obs in dataset 2.
So for sbdv = 150, I want to calculate the difference between its 'num' (0.876) and sbdv 120, 121, 123, and 125's respective 'num' values (0.987, 0.965, 0.945, 0.912). Then I want to save each 'match' to a new dataset containing all of dataset 1 and the matches for each observation.
So for this example, the new dataset would be:
sbdv | num | loc |
150 | 0.876 | 1 |
160 | 0.993 | 1 |
171 | 0.937 | 1 |
125 | 0.912 | 0 |
120 | 0.987 | 0 |
123 | 0.945 | 0 |
I am guessing there is a straight forward solution, but I haven't been able to find anything that explicitly will do this.
Assuming the pairing has to be one-to-one, this can be expressed as a linear assignment problem and solved easily with PROC OPTNET (a proc of SAS/OR):
data ds1;
input sbdv num loc;
datalines;
150 0.876 1
160 0.993 1
171 0.937 1
;
data ds2;
input sbdv num loc;
datalines;
120 0.987 0
121 0.965 0
123 0.945 0
125 0.912 0
;
proc sql;
create table pairs as
select
ds1.sbdv as from,
ds2.sbdv as to,
abs(ds1.num-ds2.num) as weight
from ds1, ds2;
quit;
proc optnet data_links=pairs direction=directed;
linear_assignment out=selected_pairs;
run;
proc sql;
create table want as
select sbdv, num, loc
from
(select sbdv, num, loc from selected_pairs, ds1 where from=sbdv)
union all
(select sbdv, num, loc from selected_pairs, ds2 where to=sbdv);
quit;
You don't need to know which observation from ds 2 match each observation from ds 1?
No, just match ds 1 obs to ds 2 based on ds1 num - ds2 num
When you say "minimum difference" do you mean absolute value of difference?
Assuming the pairing has to be one-to-one, this can be expressed as a linear assignment problem and solved easily with PROC OPTNET (a proc of SAS/OR):
data ds1;
input sbdv num loc;
datalines;
150 0.876 1
160 0.993 1
171 0.937 1
;
data ds2;
input sbdv num loc;
datalines;
120 0.987 0
121 0.965 0
123 0.945 0
125 0.912 0
;
proc sql;
create table pairs as
select
ds1.sbdv as from,
ds2.sbdv as to,
abs(ds1.num-ds2.num) as weight
from ds1, ds2;
quit;
proc optnet data_links=pairs direction=directed;
linear_assignment out=selected_pairs;
run;
proc sql;
create table want as
select sbdv, num, loc
from
(select sbdv, num, loc from selected_pairs, ds1 where from=sbdv)
union all
(select sbdv, num, loc from selected_pairs, ds2 where to=sbdv);
quit;
data ds1; input sbdv num loc; datalines; 150 0.876 1 160 0.993 1 171 0.937 1 ; data ds2; input _sbdv _num _loc; datalines; 120 0.987 0 121 0.965 0 123 0.945 0 125 0.912 0 ; data want; if _n_=1 then do; if 0 then set ds2; declare hash h(dataset:'ds2'); declare hiter hi('h'); h.definekey('_sbdv'); h.definedata(all:'y'); h.definedone(); end; set ds1; min=999999; do while(hi.next()=0); dis=abs(num-_num); if dis<min then do;min=dis;sbdv1=_sbdv;num1=_num;loc1=_loc;end; end; h.remove(key:sbdv1); drop _: min dis; run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.