BookmarkSubscribeRSS Feed
RickyS
Quartz | Level 8

I need to join a few tables together where I need to identify the maximum value, if I was using SQL I would use an inner join max group by statement to find the most current agreement_id in table 1 for each customer store in a temp table and join it back to table 2, struggling to come up with a Data Step that would accomplish same prior 

Table 1

Customer_id      Agreement_id

1234                  1

1234                  2

1234                  3

1235                  4

 

Table 2             

Agreement_Id  Contract_id     Implementation_Date         Expiration_date

1                        1                      1/1/2016 0:00                   1/1/2018 0:00

2                        2                      1/1/1994 0:00                   NULL

3                        1                      1/1/2004 0:00                   1/1/2020 0:00

4                        1                      1/7/2016 0:00                   NULL

6 REPLIES 6
Reeza
Super User
So why not use Proc SQL?

Are those your input data sets? If so , please post expected output.A data step solution is also possible, but I think SQL may be easier in this case as well.
RickyS
Quartz | Level 8

This is just one of many data steps and most has been written in BASE already Reeza, plus honestly I'd just like to learn how to do it in SAS vs PROC SQL as I'm sure there is a way to do it.

 

The output data would be.

 

Customer_Id    Agreement_id  Contract_id    Implementation_date   Expiration_date

1234                 3                       1                    1/1/2004                      1/1/2020

1235                 4                       1                    1/7/2016                       NULL

ballardw
Super User

Maximum value of what?

If Agreement_id is what you are looking for, then Are your agreement_id values actually numeric? If they aren't numeric "maximum" may be a tad ambiguous.

 

Here's my guess of what you may be asking. Note: for example data I made your datetime values dates and replaced NULL with SAS missings.

 

data Set1;
input Customer_id      Agreement_id ;
datalines;
1234                  1
1234                  2
1234                  3
1235                  4
;
run;
 
data Set2;
   informat Implementation_Date    mmddyy.     Expiration_date mmddyy. ;
   format Implementation_Date    mmddyy10.     Expiration_date mmddyy10. ;
   input Agreement_Id  Contract_id     Implementation_Date       Expiration_date  ;
datalines;
1                        1                      1/1/2016                    1/1/2018 
2                        2                      1/1/1994                    .
3                        1                      1/1/2004                    1/1/2020 
4                        1                      1/7/2016                    .
;
run;
/* I'm guessing the customer data and contract data are matched using the single key agreement_id*/
proc sort data=set1;by agreement_id;run;
proc sort data=set2;by agreement_id;run;

data merged;
   merge set1 set2;
   by agreement_id;
run;
/* may not be needed for example data but the order will be important in the next step
with more complex data*/
proc sort data=merged; by customer_id agreement_id;run;
/* LAST value of a sorted group should be the record associated with the max value*/
data want; set merged; by customer_id agreement_id; if last.customer_id; run;
RickyS
Quartz | Level 8

These are integer values

Astounding
PROC Star

I can give you an approach, but I don't have time to write the code.  I also may be overcomplicating the problem, assuming that there will be a third table not illustrated here with records for a subset of CUSTOMER_IDs.

 

Sort Table1 by CUSTOMER_ID AGREEMENT_ID.

 

Create hash table #1:  Map from CUSTOMER_ID to AGREEMENT_ID.  When loading the hash table, use the REPLACE method so you are left with one entry per CUSTOMER_ID, with the maximum AGREEMENT_ID.

 

Create hash table #2:  Map from AGREEMENT_ID to the other data elements in Table2.

 

From your unnamed table3 data, read a record.  That gives you CUSTOMER_ID.  Use the first hash table to look up the AGREEMENT_ID, then use the second hash table to retrieve the remaining fields.  Be sure to pay attention to the possibility of "no match" being found at each point.

 

Even if the problem isn't that complicated, and you only need to match Table1 and Table2, this can be done.  Subsetting Table1 would become a relatively easy step, and only Table2 would need to be loaded into a hash table.

 

Good luck.

Reeza
Super User

If you're looking for the latest implementation date, that's doable. The 'easy' way to do this in SAS is to sort the data so that the record you want is the first or last record per group and then use the BY processing to isolate that record. 

 

BY processing and the first/last automatic variables are very powerful, and often something that's 'difficult' to replicate in SQL so it's definitely worth learning. By difficult I mean it takes more steps and isn't as intuitive. 

 

I think @ballardw solution implements this methodology. 

 

Good Luck with your SAS programming. 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1418 views
  • 0 likes
  • 4 in conversation