Morning all,
I'd like to know how you would approach the following problem which I am trying to automate:
We have a lookup table in excel that has 4 variables plus a start date and an end date and finally a value to be returned. For example...
Have
data have;
infile datalines dsd missover;
input county $ town $ school $ class $ start_date :ddmmyy10. end_date :ddmmyy10. Value_To_Return;
format start_date end_date date9.;
cards;
"Essex", "Braintree", "Tabor", "Blue", 12/10/2017, 31/07/2018, 100
"Essex", "Braintree", "Tabor", "Blue", 01/08/2018, 31/12/2018, 150
"Essex", "Braintree", "Tabor", "Blue", 01/01/2019, 15/04/2019, 125
"Essex", "Braintree", "Tabor", "Blue", 16/04/2019, 31/12/2099, 140
"Essex", "Braintree", "Tabor", "Red", 12/10/2017, 31/07/2018, 100
"Essex", "Braintree", "Tabor", "Red", 01/08/2018, 31/12/2018, 140
"Essex", "Braintree", "Tabor", "Red", 01/01/2019, 15/04/2019, 130
"Essex", "Braintree", "Tabor", "Red", 16/04/2019, 31/12/2099, 140
;
run;
Want
I want to get the Value_To_Return field from any dataset that has the first 4 variables to match on, plus a variable called Event_Date which should be used as the data look up for between Start_Date and End_Date.
Is there a multi variable Format Lookup I could use?
Obviously the example above only has 8 rows, but the reality is many thousands.
Many thanks
I want to get the Value_To_Return field from any dataset that has the first 4 variables to match on, plus a variable called Event_Date which should be used as the data look up for between Start_Date and End_Date.
I don't understand this. Explain step-by-step, with an example or two or three. Thanks.
I have the dataset in the SAS code (albeit it exists in Excel at the moment)
I want to be able to use any other dataset I have to lookup the final value (Value_To_Return). So if I have another dataset that has:
County - "Essex", Town - "Braintree", School - "Tabor", Class - "Blue", Event_Date - 01/10/2018 Then I would like the value of 150 to be returned.
Okay, thank you, I think I understand. This seems to be an SQL join between the first table and the second table, by date range. Assuming your dates are actual SAS date values and not character strings, then this ought to work
proc sql;
create table want as select h.*,l.event_date from
have as h left join lookup as l
on h.county=l.county and h.town=l.town and h.school=l.school
and h.class=l.class and l.event_date>=h.start_date and l.event_date<=h.end_date;
quit;
You could do this in a DATA step with the MERGE command.
Thank you.
Yes, if you can do it in SQL you can also do it as a merge, but i'm specifically after a method to do it as a lookup from within any datastep as per my request.
Cheers
data want;
merge have lookup;
by county town school class;
if start_date<=event_date<=end_date;
run;
This assumes both data sets are sorted by COUNTY TOWN SCHOOL CLASS.
You're not getting what I mean by lookup within a datastep Paige...
data Lookup;
infile datalines dsd missover;
input county $ town $ school $ class $ start_date :ddmmyy10. end_date :ddmmyy10. Value_To_Return;
format start_date end_date date9.;
cards;
"Essex", "Braintree", "Tabor", "Blue", 12/10/2017, 31/07/2018, 100
"Essex", "Braintree", "Tabor", "Blue", 01/08/2018, 31/12/2018, 150
"Essex", "Braintree", "Tabor", "Blue", 01/01/2019, 15/04/2019, 125
"Essex", "Braintree", "Tabor", "Blue", 16/04/2019, 31/12/2099, 140
"Essex", "Braintree", "Tabor", "Red", 12/10/2017, 31/07/2018, 120
"Essex", "Braintree", "Tabor", "Red", 01/08/2018, 31/12/2018, 140
"Essex", "Braintree", "Tabor", "Red", 01/01/2019, 15/04/2019, 130
"Essex", "Braintree", "Tabor", "Red", 16/04/2019, 31/12/2099, 140
;
run;
Data Have;
infile datalines dsd missover;
input county $ town $ school $ class $ eventdate :ddmmyy10.;
format eventdate date9.;
cards;
"Essex", "Braintree", "Tabor", "Blue", 15/3/2019
"Essex", "Braintree", "Tabor", "Blue", 01/10/2018
"Essex", "Braintree", "Tabor", "Red", 12/01/2019
;
run;
For this example, I would want to be working with the Have dataset and get the Value_To_Return where all 4 character variables match and also where eventdate was <= start_date and <= end_date.
The data in these instances would return 125 on row 1, 150 on row 2 and 130 on row 3.
I have written the Proc FCMP that looks up the character variables, but I dont know how to adapt it to include the date range lookup as well.
proc fcmp outlib=work.functions.samples;;
function get_score(county $, town $, school $, class $);
declare hash Lkup(dataset: "work.Lookup");
rc = lkup.definedata("Value_To_Return");
rc=lkup.definekey("county", "town", "school", "class");
rc=lkup.definedone();
rc=lkup.find();
if rc eq 0 then return(Value_To_Return);
else return(.);
endsub;
quit;
options cmplib=work.functions;
data want;
set have;
returned_value = get_score(county, town, school, class);
run;
Any suggestions on how to include a clause within the hash lookup to search for eventdate being between start_date and end_date?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.