Hi,
I'm new to Hash and am trying to determine if my problem at hand would be a good one for it. In a nutshell, what I to do is make a selection for each account based on certain crieria...
1) each "account" must only have 1 record.
2) i want to add 1 to the vendor total every time a vendor is selected.
3) if a vendor is, or becomes, over the limit using the total from criteria 2, then i need to do a check to see if there is another vendor for the account and select the next vendor (under-limit) if there is, and if not, then select the current vendor.
If there is any easier non-hash way to do this as well I would be open to those responses as well..I've included the "have" and "want" datasets below.
Thanks in advance,
Tom
data have;
input account vendor $ limit;
datalines;
101 a 2
101 b 3
102 a 2
103 a 2
105 a 2
104 a 2
102 b 3
104 c 2
103 b 2
;run;
data want;
input account vendor $ limit selected_vendor $ vendor_total;
datalines;
101 a 2 a 1
102 a 2 a 2
105 a 2 a 3
104 c 2 c 1
103 b 2 b 1
;run;
I would suggest presorting your incoming data by account, vendor, so your outcome gets to be more consistent, otherwise, depending on which vendor gets into the account first, the final outcome may vary. The following code may need some cleaning, but it seems does what you want;
data have;
input account vendor $ limit;
datalines;
101 a 2
101 b 3
102 a 2
103 a 2
105 a 2
104 a 2
102 b 3
104 c 2
103 b 2
;
run;
data _null_;
if _n_=1 then
do;
if 0 then
set have(rename=vendor=_v drop=limit);
declare hash h1(dataset:'have(rename=vendor=_v drop=limit)', multidata:'y');
h1.definekey('account');
h1.definedata(all:
'y');
h1.definedone();
declare hash want();
want.definekey('account');
want.definedata('account','vendor','limit','vendor_total');
want.definedone();
declare hiter ht_want('want');
declare hash ck();
ck.definekey('vendor');
ck.definedata('vendor','limit','vendor_total');
ck.definedone();
end;
set have end=done;
rc_want=want.check();
if rc_want ne 0 then
do;
rc_ck=ck.find();
if rc_ck ne 0 or (rc_ck=0 and vendor_total < limit) then
do;
if rc_ck ne 0 then
vendor_total=1;
else vendor_total+1;
rc_ck=ck.replace();
rc_want=want.add();
end;
else if rc_ck =0 and vendor_total >= limit then
do;
do rc=h1.find() by 0 while (rc=0);
rc_ck=ck.find(key:_v);
if rc_ck ne 0 or (rc_ck=0 and vendor_total < limit) then
do;
if rc_ck ne 0 then
vendor_total=1;
else vendor_total+1;
vendor=_v;
rc_ck=ck.replace();
rc_want=want.add();
leave;
end;
else rc=h1.find_next();
end;
if rc ne 0 then
do;
rc_ck=ck.find();
vendor_total+1;
rc_ck=ck.replace();
rc_want=want.add();
end;
end;
end;
if done then rc=want.output(dataset:'work.want');
run;
Please provide an example Have data set that actually can actually create the desired output. I do not see how account 105 gets a vendor_total of 3.
And why is there no output for account 101 and vendor b?
Every time a vendor is selected, i need the number of times it has been selected to keep a running total. So, vendor "a" was chosen with account 101, so the total selected is 1...then it was selected again with account 102, so total becomes 2. then it is not selected with 103 because this vendor has already hit the limit of 2, and 103 has another vendor it could use. Then for 105, it is the only vendor for this account, so it must select vendor "a", giving vendor "a" a running total of 3 times being selected.
I would suggest presorting your incoming data by account, vendor, so your outcome gets to be more consistent, otherwise, depending on which vendor gets into the account first, the final outcome may vary. The following code may need some cleaning, but it seems does what you want;
data have;
input account vendor $ limit;
datalines;
101 a 2
101 b 3
102 a 2
103 a 2
105 a 2
104 a 2
102 b 3
104 c 2
103 b 2
;
run;
data _null_;
if _n_=1 then
do;
if 0 then
set have(rename=vendor=_v drop=limit);
declare hash h1(dataset:'have(rename=vendor=_v drop=limit)', multidata:'y');
h1.definekey('account');
h1.definedata(all:
'y');
h1.definedone();
declare hash want();
want.definekey('account');
want.definedata('account','vendor','limit','vendor_total');
want.definedone();
declare hiter ht_want('want');
declare hash ck();
ck.definekey('vendor');
ck.definedata('vendor','limit','vendor_total');
ck.definedone();
end;
set have end=done;
rc_want=want.check();
if rc_want ne 0 then
do;
rc_ck=ck.find();
if rc_ck ne 0 or (rc_ck=0 and vendor_total < limit) then
do;
if rc_ck ne 0 then
vendor_total=1;
else vendor_total+1;
rc_ck=ck.replace();
rc_want=want.add();
end;
else if rc_ck =0 and vendor_total >= limit then
do;
do rc=h1.find() by 0 while (rc=0);
rc_ck=ck.find(key:_v);
if rc_ck ne 0 or (rc_ck=0 and vendor_total < limit) then
do;
if rc_ck ne 0 then
vendor_total=1;
else vendor_total+1;
vendor=_v;
rc_ck=ck.replace();
rc_want=want.add();
leave;
end;
else rc=h1.find_next();
end;
if rc ne 0 then
do;
rc_ck=ck.find();
vendor_total+1;
rc_ck=ck.replace();
rc_want=want.add();
end;
end;
end;
if done then rc=want.output(dataset:'work.want');
run;
There are something you need to clarify . What if 105 has multiple obs ? Why the last obs b=2 not b=3 ? data have; input account vendor $ limit; datalines; 101 a 2 101 b 3 102 a 2 103 a 2 105 a 2 104 a 2 102 b 3 104 c 2 103 b 3 ; run; %let dsid=%sysfunc(open(have)); %let nobs=%sysfunc(attrn(&dsid,nlobs)); %let dsid=%sysfunc(close(&dsid)); proc sort data=have(keep=vendor limit) out=key nodupkey; by vendor limit; run; data want; if _n_ eq 1 then do; array x{&nobs} _temporary_; array y{&nobs} $ 40 _temporary_; n=0; do until(last); set have end=last; n+1;x{n}=account;y{n}=vendor; end; end; set key end=end; count=0; do i=1 to &nobs; if vendor=y{i} then do; count+1; if count le limit then do; account=x{i};output; do j=1 to &nobs; if account=x{j} then call missing(x{j},y{j}); end; end; end; end; if end then do; do i=1 to &nobs; if not missing(x{i}) and not missing(y{i}) then do; account=x{i};vendor=y{i};output; end; end; end; drop n i j count ; run;
Thanks for your response...To answer your questions, if 105 had multiple vendors, than it should not select vendor "a". It would select whichever the other vendor was, as long as that one still within the limit.
Each vendor has a limit, so you can think about it like this:
Vendor "a" can have no more than 2 accounts assigned to them.
Vendor "b" can have no more than 3 accounts assigned to them.
Vendor "c" can have no more than 2 accounts assigned to them.
The only exception would be, that if there is only 1 vendor available for each account, then the program would need to select that vendor, weather they are within the limit or not.
hope this helps, thanks for the quick response
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.