I'm merging two data files and creating a cartesian join or merge. My end goal is to determine for each SELLER and BUYER combination, what products are matching
The Data looks as follows:
Dataset SELLERS
SELLERID P1 P2 P3........ P30
123456 1 1 1
234567 1 1
Dataset BUYERS
BUYERID P1 P2 P3........ P30
ab23456 1 1
XS98765 1
If a SELLER sells a product a '1' is indicated and similar for BUYERS. So in this case, SELLER 123456 sells P2,P3, and P30. BUYER ab23456 is interested in P2 and P3.
I could see creating a cartesion product using:
PROC SQL;
CREATE TABLE ALL as
SELECT *
FROM SELLERS BUYERS;
QUIT;
This would produce duplicate names for the products. I'm thinking some array processing could be useful.
I would be very tempted to change the data to a long format
SellerId ProductId
123456 P2
123456 P3
123456 P30
234567 P1
etc for both the Seller an buyer data.
then
Proc sql;
create table want as
select distinct a.productId, a.sellerId, b.buyerid
from SellerDatalong as a join BuyerDataLong as b
on a.productId = b.productid;
quit;
I would be very tempted to change the data to a long format
SellerId ProductId
123456 P2
123456 P3
123456 P30
234567 P1
etc for both the Seller an buyer data.
then
Proc sql;
create table want as
select distinct a.productId, a.sellerId, b.buyerid
from SellerDatalong as a join BuyerDataLong as b
on a.productId = b.productid;
quit;
thanks. I'm wondering using this approach what the output would looklike (Note: I do not currently have the data to mess with) and am awaiting to have SAS installed on my machine
The result would be a table like
ProductId SellerId Buyer
P1 123456 ABCDEF
P1 123456 ADFAS
P1 123456 XYz23
you could add Order by ProductID, SellerId, BuyerId to force a specific order.
Each product would have every seller matched with every buyer of that product.
The distinct clause prevents getting multiples of any specific combination.
This is one way to realize 's idea:
data seller;
input sellerid $ p1-p4;
cards;
423456 . 1 1 1
234567 1 . . 1
;
data buyer;
input buyerid $ p1-p4;
cards;
ab3456 . 1 1 .
xs4567 . 1 . 1
;
proc sort data=seller;
by sellerid;
run;
proc transpose data=seller out=s1 (where=(not missing(col1)));
var p1-p4;
by sellerid;
run;
proc sort data=buyer;
by buyerid;
run;
proc transpose data=buyer out=b1 (where=(not missing(col1)));
var p1-p4;
by buyerid;
run;
proc sql;
create table want (drop=col1) as
select * from s1, b1
where s1._name_=b1._name_;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.