01-07-2016 02:23 PM
I need to join a few tables together where I need to identify the maximum value, if I was using SQL I would use an inner join max group by statement to find the most current agreement_id in table 1 for each customer store in a temp table and join it back to table 2, struggling to come up with a Data Step that would accomplish same prior
Agreement_Id Contract_id Implementation_Date Expiration_date
1 1 1/1/2016 0:00 1/1/2018 0:00
2 2 1/1/1994 0:00 NULL
3 1 1/1/2004 0:00 1/1/2020 0:00
4 1 1/7/2016 0:00 NULL
01-07-2016 02:35 PM
01-07-2016 02:48 PM
This is just one of many data steps and most has been written in BASE already Reeza, plus honestly I'd just like to learn how to do it in SAS vs PROC SQL as I'm sure there is a way to do it.
The output data would be.
Customer_Id Agreement_id Contract_id Implementation_date Expiration_date
1234 3 1 1/1/2004 1/1/2020
1235 4 1 1/7/2016 NULL
01-07-2016 03:26 PM - edited 01-07-2016 03:39 PM
Maximum value of what?
If Agreement_id is what you are looking for, then Are your agreement_id values actually numeric? If they aren't numeric "maximum" may be a tad ambiguous.
Here's my guess of what you may be asking. Note: for example data I made your datetime values dates and replaced NULL with SAS missings.
data Set1; input Customer_id Agreement_id ; datalines; 1234 1 1234 2 1234 3 1235 4 ; run; data Set2; informat Implementation_Date mmddyy. Expiration_date mmddyy. ; format Implementation_Date mmddyy10. Expiration_date mmddyy10. ; input Agreement_Id Contract_id Implementation_Date Expiration_date ; datalines; 1 1 1/1/2016 1/1/2018 2 2 1/1/1994 . 3 1 1/1/2004 1/1/2020 4 1 1/7/2016 . ; run; /* I'm guessing the customer data and contract data are matched using the single key agreement_id*/ proc sort data=set1;by agreement_id;run; proc sort data=set2;by agreement_id;run; data merged; merge set1 set2; by agreement_id; run; /* may not be needed for example data but the order will be important in the next step
with more complex data*/
proc sort data=merged; by customer_id agreement_id;run;
/* LAST value of a sorted group should be the record associated with the max value*/
data want; set merged; by customer_id agreement_id; if last.customer_id; run;
01-07-2016 03:46 PM
I can give you an approach, but I don't have time to write the code. I also may be overcomplicating the problem, assuming that there will be a third table not illustrated here with records for a subset of CUSTOMER_IDs.
Sort Table1 by CUSTOMER_ID AGREEMENT_ID.
Create hash table #1: Map from CUSTOMER_ID to AGREEMENT_ID. When loading the hash table, use the REPLACE method so you are left with one entry per CUSTOMER_ID, with the maximum AGREEMENT_ID.
Create hash table #2: Map from AGREEMENT_ID to the other data elements in Table2.
From your unnamed table3 data, read a record. That gives you CUSTOMER_ID. Use the first hash table to look up the AGREEMENT_ID, then use the second hash table to retrieve the remaining fields. Be sure to pay attention to the possibility of "no match" being found at each point.
Even if the problem isn't that complicated, and you only need to match Table1 and Table2, this can be done. Subsetting Table1 would become a relatively easy step, and only Table2 would need to be loaded into a hash table.
01-07-2016 05:30 PM
If you're looking for the latest implementation date, that's doable. The 'easy' way to do this in SAS is to sort the data so that the record you want is the first or last record per group and then use the BY processing to isolate that record.
BY processing and the first/last automatic variables are very powerful, and often something that's 'difficult' to replicate in SQL so it's definitely worth learning. By difficult I mean it takes more steps and isn't as intuitive.
I think @ballardw solution implements this methodology.
Good Luck with your SAS programming.