Hi,
I have a big dataset (A) of patients containing millions of rows and 50 columns. I need to do Multiple Hash lookups of the table using Macros. For example, I have patient info below by Patient Id.
Pt_Id | Loc | HQ | Inv | Fill_Loc | Fill_Qty |
1058 | STL | ORD | 500 | MEM | 100 |
1099 | PIT | MEM | 50 | STL | 110 |
1058 | MCI | MEM | 50 | ORD | 120 |
1058 | ORD | MEM | 50 | ORI | 110 |
1059 | CDL | IAD | 100 | MCI | 0 |
2051 | HOW | HIT | 100 | IAD | 4 |
2045 | IND | IAD | 5 | MCI | 5 |
2051 | MRD | IAD | 1 | IND | 50 |
2051 | ORL | IAD | 2 | LTY | 80 |
2051 | STK | IAD | 5 | ORF | 40 |
1. I need to do Hash lookup on Pt_Id and Fill_Loc. For example, Pt_Id, Fill_Loc
Pt_Id | Fill_Loc | Fill_Qty |
1058 | MEM | 100 |
2. Using the result from 1, I need to do look-up on Pt_id and Fill_loc from 1 will become HQ.....
Pt_Id | HQ |
1058 | MEM |
3. From 1, Patient No. 1058 had a Fill_Qty of 100. From 2, I need to lookup Pt_Id, HQ and get Inv. If Fill_Qty was 100 then I need to show Inv of 100.
Pt_Id | Loc | Inv |
1058 | MCI | 50 |
1058 | ORD | 50 |
4. The total Inv should not be more them Fill_Qty. If Inv. was more then Fill_Qty then I need to check each record and reduce the last record so that the total will be equal to Fill_Qty.
Below is another example where Inv was more than Fill_Qty and I'll adjust the last inv. record to be equal to Fill_Qty.
1. Lookup
Pt_Id | Fill_Loc | Fill_Qty |
2051 | IAD | 4 |
2.
Pt_Id | HQ | Inv |
2051 | IAD | 1 |
2051 | IAD | 2 |
2051 | IAD | 5 |
3. Result - Total of Inv. = Fill_Qty
Pt_Id | HQ | Inv |
2051 | IAD | 1 |
2051 | IAD | 2 |
2051 | IAD | 1 |
The last row was adjusted from 5 to 1 so that the total of Inv matches Fill_Qty.
Could you please supply (as a first example) the complete result (all observations, all variables) for patient 1058?
Here is the full dataset for 1058.
Pt_Id | Loc | HQ | Inv | Fill_Loc | Fill_Qty |
1058 | STL | ORD | 500 | MEM | 100 |
1058 | MCI | MEM | 50 | ORD | 120 |
1058 | ORD | MEM | 50 | ORI | 110 |
1058 | FIT | MEM | 10 | MEM | 10 |
So, just to get things started, here is the first part (1).
data have;
input Pt_Id $ Loc $ HQ $ Inv Fill_Loc $ Fill_Qty;
datalines;
1058 STL ORD 500 MEM 100
1099 PIT MEM 50 STL 110
1058 MCI MEM 50 ORD 120
1058 ORD MEM 50 ORI 110
1059 CDL IAD 100 MCI 0
2051 HOW HIT 100 IAD 4
2045 IND IAD 5 MCI 5
2051 MRD IAD 1 IND 50
2051 ORL IAD 2 LTY 80
2051 STK IAD 5 ORF 40
;
data want;
length Fill_Qty 8;
if _N_ = 1 then do;
declare hash h(dataset:"have");
h.defineKey('Pt_Id','Fill_Loc');
h.defineData('Fill_Qty');
h.defineDone();
call missing(Fill_Qty);
end;
Pt_Id="1058";Fill_Loc="MEM";
h.find();
run;
Next: From question 2: "Fill_loc from 1 will become HQ".. I don't understand that? Please elaborate
You cannot hard code Pt_id and Fill_Loc. It is basically a loop. In the first run it is looking for 1058 and MEM Second run it will look for 1099 and STl and so on.
Coming to your question about Fill_Location will be HQ ....So, the Ist lookup is Pt_id and Fill_Loc. Now, the output has to be looked up under Pt_id and HQ.....MEM has to be filtered under Variable HQ.
Pt_Id | Fill_Loc |
1058 | MEM |
Pt_Id | HQ |
1058 | MEM |
If you're after fully working code then start by providing representative sample data for everything - and provide this sample data in form of tested SAS data steps so we've got something to start with.
Then explain in detail the problem you have to solve - one step at a time.
May be break up the problem into steps, give it a first go and come back here asking questions for the things you get stuck with - and also post the code you've already developed.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.