Without knowing details of what you might consider a "match", here is a very generic way.
The first two datasets just build some dummy data with a name and value. The Proc sql compares the sales values (assumes that is what you want to match on, not much else in this data) and selects matches when the difference is 300 or less. Depending on the number of matches you might want you would adjust the value of that comparison parameter up or down.
If you only want the compare company to "match" one in the treatment set it will be up to you to remove the duplicates which would not be very difficult (left as an exercise for the interested reader).
If you multiple values to match on describing the space could be a simple as adding AND for the different variables.
Refinements but I'm too lazy to try to code today would be to identify the minimum absolute difference and only select the compare companies with that value.
I am not making ANY claim to efficiency in the code. Simple and brute force. Careful consideration of the differences at first step may help.
data treat;
input name $ sales;
datalines;
abc 10000
cdf 20000
ged 30000
;
run;
data compare;
length compname $ 8;
do a='S','T','U','V','X','Y','Z';
do b='S','T','U','V','X','Y','Z';
do c='S','T','U','V','X','Y','Z';
do d='S','T','U','V','X','Y','Z';
compname=catt(a,b,c,d);
compsales = round(100000*rand('uniform'),100);
output;
end;
end;
end;
end;
drop a b c;
run;
proc sql;
create table compgroup as
select a.*,b.*
from treat as a, compare as b
where abs(a.sales-b.compsales) < 300;
quit;
A real rough other approach would be to combine the two data sets with a variable that contains which set each record comes from.
Sort on the comparison values.
Extract the treatment records and the one or two following the treatment record (very easy) and possibly one or two preceding ( a litle more work).
How do you want to compare them? From your question only, it could be on the first letter of their name .
Without knowing details of what you might consider a "match", here is a very generic way.
The first two datasets just build some dummy data with a name and value. The Proc sql compares the sales values (assumes that is what you want to match on, not much else in this data) and selects matches when the difference is 300 or less. Depending on the number of matches you might want you would adjust the value of that comparison parameter up or down.
If you only want the compare company to "match" one in the treatment set it will be up to you to remove the duplicates which would not be very difficult (left as an exercise for the interested reader).
If you multiple values to match on describing the space could be a simple as adding AND for the different variables.
Refinements but I'm too lazy to try to code today would be to identify the minimum absolute difference and only select the compare companies with that value.
I am not making ANY claim to efficiency in the code. Simple and brute force. Careful consideration of the differences at first step may help.
data treat;
input name $ sales;
datalines;
abc 10000
cdf 20000
ged 30000
;
run;
data compare;
length compname $ 8;
do a='S','T','U','V','X','Y','Z';
do b='S','T','U','V','X','Y','Z';
do c='S','T','U','V','X','Y','Z';
do d='S','T','U','V','X','Y','Z';
compname=catt(a,b,c,d);
compsales = round(100000*rand('uniform'),100);
output;
end;
end;
end;
end;
drop a b c;
run;
proc sql;
create table compgroup as
select a.*,b.*
from treat as a, compare as b
where abs(a.sales-b.compsales) < 300;
quit;
A real rough other approach would be to combine the two data sets with a variable that contains which set each record comes from.
Sort on the comparison values.
Extract the treatment records and the one or two following the treatment record (very easy) and possibly one or two preceding ( a litle more work).
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.