Hi,
I have a big dataset (A) of patients containing millions of rows and 50 columns. I need to do Multiple Hash lookups of the table using Macros. For example, I have patient info below by Patient Id.
Pt_Id | Loc | HQ | Inv | Fill_Loc | Fill_Qty |
1058 | STL | ORD | 500 | MEM | 100 |
1099 | PIT | MEM | 50 | STL | 110 |
1058 | MCI | MEM | 50 | ORD | 120 |
1058 | ORD | MEM | 50 | ORI | 110 |
1059 | CDL | IAD | 100 | MCI | 0 |
2051 | HOW | HIT | 100 | IAD | 4 |
2045 | IND | IAD | 5 | MCI | 5 |
2051 | MRD | IAD | 1 | IND | 50 |
2051 | ORL | IAD | 2 | LTY | 80 |
2051 | STK | IAD | 5 | ORF | 40 |
1. I need to do Hash lookup on Pt_Id and Fill_Loc. For example, Pt_Id, Fill_Loc
Pt_Id | Fill_Loc | Fill_Qty |
1058 | MEM | 100 |
2. Using the result from 1, I need to do look-up on Pt_id and Fill_loc from 1 will become HQ.....
Pt_Id | HQ |
1058 | MEM |
3. From 1, Patient No. 1058 had a Fill_Qty of 100. From 2, I need to lookup Pt_Id, HQ and get Inv. If Fill_Qty was 100 then I need to show Inv of 100.
Pt_Id | Loc | Inv |
1058 | MCI | 50 |
1058 | ORD | 50 |
4. The total Inv should not be more them Fill_Qty. If Inv. was more then Fill_Qty then I need to check each record and reduce the last record so that the total will be equal to Fill_Qty.
Below is another example where Inv was more than Fill_Qty and I'll adjust the last inv. record to be equal to Fill_Qty.
1. Lookup
Pt_Id | Fill_Loc | Fill_Qty |
2051 | IAD | 4 |
2.
Pt_Id | HQ | Inv |
2051 | IAD | 1 |
2051 | IAD | 2 |
2051 | IAD | 5 |
3. Result - Total of Inv. = Fill_Qty
Pt_Id | HQ | Inv |
2051 | IAD | 1 |
2051 | IAD | 2 |
2051 | IAD | 1 |
The last row was adjusted from 5 to 1 so that the total of Inv matches Fill_Qty.
Could you please supply (as a first example) the complete result (all observations, all variables) for patient 1058?
Here is the full dataset for 1058.
Pt_Id | Loc | HQ | Inv | Fill_Loc | Fill_Qty |
1058 | STL | ORD | 500 | MEM | 100 |
1058 | MCI | MEM | 50 | ORD | 120 |
1058 | ORD | MEM | 50 | ORI | 110 |
1058 | FIT | MEM | 10 | MEM | 10 |
So, just to get things started, here is the first part (1).
data have;
input Pt_Id $ Loc $ HQ $ Inv Fill_Loc $ Fill_Qty;
datalines;
1058 STL ORD 500 MEM 100
1099 PIT MEM 50 STL 110
1058 MCI MEM 50 ORD 120
1058 ORD MEM 50 ORI 110
1059 CDL IAD 100 MCI 0
2051 HOW HIT 100 IAD 4
2045 IND IAD 5 MCI 5
2051 MRD IAD 1 IND 50
2051 ORL IAD 2 LTY 80
2051 STK IAD 5 ORF 40
;
data want;
length Fill_Qty 8;
if _N_ = 1 then do;
declare hash h(dataset:"have");
h.defineKey('Pt_Id','Fill_Loc');
h.defineData('Fill_Qty');
h.defineDone();
call missing(Fill_Qty);
end;
Pt_Id="1058";Fill_Loc="MEM";
h.find();
run;
Next: From question 2: "Fill_loc from 1 will become HQ".. I don't understand that? Please elaborate
You cannot hard code Pt_id and Fill_Loc. It is basically a loop. In the first run it is looking for 1058 and MEM Second run it will look for 1099 and STl and so on.
Coming to your question about Fill_Location will be HQ ....So, the Ist lookup is Pt_id and Fill_Loc. Now, the output has to be looked up under Pt_id and HQ.....MEM has to be filtered under Variable HQ.
Pt_Id | Fill_Loc |
1058 | MEM |
Pt_Id | HQ |
1058 | MEM |
If you're after fully working code then start by providing representative sample data for everything - and provide this sample data in form of tested SAS data steps so we've got something to start with.
Then explain in detail the problem you have to solve - one step at a time.
May be break up the problem into steps, give it a first go and come back here asking questions for the things you get stuck with - and also post the code you've already developed.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.