DATA Step, Macro, Functions and more

Selecting Max value in Data Step

Reply
Contributor
Posts: 20

Selecting Max value in Data Step

I need to join a few tables together where I need to identify the maximum value, if I was using SQL I would use an inner join max group by statement to find the most current agreement_id in table 1 for each customer store in a temp table and join it back to table 2, struggling to come up with a Data Step that would accomplish same prior 

Table 1

Customer_id      Agreement_id

1234                  1

1234                  2

1234                  3

1235                  4

 

Table 2             

Agreement_Id  Contract_id     Implementation_Date         Expiration_date

1                        1                      1/1/2016 0:00                   1/1/2018 0:00

2                        2                      1/1/1994 0:00                   NULL

3                        1                      1/1/2004 0:00                   1/1/2020 0:00

4                        1                      1/7/2016 0:00                   NULL

Super User
Posts: 19,867

Re: Selecting Max value in Data Step

So why not use Proc SQL?

Are those your input data sets? If so , please post expected output.A data step solution is also possible, but I think SQL may be easier in this case as well.
Contributor
Posts: 20

Re: Selecting Max value in Data Step

This is just one of many data steps and most has been written in BASE already Reeza, plus honestly I'd just like to learn how to do it in SAS vs PROC SQL as I'm sure there is a way to do it.

 

The output data would be.

 

Customer_Id    Agreement_id  Contract_id    Implementation_date   Expiration_date

1234                 3                       1                    1/1/2004                      1/1/2020

1235                 4                       1                    1/7/2016                       NULL

Super User
Posts: 11,343

Re: Selecting Max value in Data Step

[ Edited ]

Maximum value of what?

If Agreement_id is what you are looking for, then Are your agreement_id values actually numeric? If they aren't numeric "maximum" may be a tad ambiguous.

 

Here's my guess of what you may be asking. Note: for example data I made your datetime values dates and replaced NULL with SAS missings.

 

data Set1;
input Customer_id      Agreement_id ;
datalines;
1234                  1
1234                  2
1234                  3
1235                  4
;
run;
 
data Set2;
   informat Implementation_Date    mmddyy.     Expiration_date mmddyy. ;
   format Implementation_Date    mmddyy10.     Expiration_date mmddyy10. ;
   input Agreement_Id  Contract_id     Implementation_Date       Expiration_date  ;
datalines;
1                        1                      1/1/2016                    1/1/2018 
2                        2                      1/1/1994                    .
3                        1                      1/1/2004                    1/1/2020 
4                        1                      1/7/2016                    .
;
run;
/* I'm guessing the customer data and contract data are matched using the single key agreement_id*/
proc sort data=set1;by agreement_id;run;
proc sort data=set2;by agreement_id;run;

data merged;
   merge set1 set2;
   by agreement_id;
run;
/* may not be needed for example data but the order will be important in the next step
with more complex data*/
proc sort data=merged; by customer_id agreement_id;run;
/* LAST value of a sorted group should be the record associated with the max value*/
data want; set merged; by customer_id agreement_id; if last.customer_id; run;
Contributor
Posts: 20

Re: Selecting Max value in Data Step

These are integer values

Super User
Posts: 5,516

Re: Selecting Max value in Data Step

I can give you an approach, but I don't have time to write the code.  I also may be overcomplicating the problem, assuming that there will be a third table not illustrated here with records for a subset of CUSTOMER_IDs.

 

Sort Table1 by CUSTOMER_ID AGREEMENT_ID.

 

Create hash table #1:  Map from CUSTOMER_ID to AGREEMENT_ID.  When loading the hash table, use the REPLACE method so you are left with one entry per CUSTOMER_ID, with the maximum AGREEMENT_ID.

 

Create hash table #2:  Map from AGREEMENT_ID to the other data elements in Table2.

 

From your unnamed table3 data, read a record.  That gives you CUSTOMER_ID.  Use the first hash table to look up the AGREEMENT_ID, then use the second hash table to retrieve the remaining fields.  Be sure to pay attention to the possibility of "no match" being found at each point.

 

Even if the problem isn't that complicated, and you only need to match Table1 and Table2, this can be done.  Subsetting Table1 would become a relatively easy step, and only Table2 would need to be loaded into a hash table.

 

Good luck.

Super User
Posts: 19,867

Re: Selecting Max value in Data Step

If you're looking for the latest implementation date, that's doable. The 'easy' way to do this in SAS is to sort the data so that the record you want is the first or last record per group and then use the BY processing to isolate that record. 

 

BY processing and the first/last automatic variables are very powerful, and often something that's 'difficult' to replicate in SQL so it's definitely worth learning. By difficult I mean it takes more steps and isn't as intuitive. 

 

I think @ballardw solution implements this methodology. 

 

Good Luck with your SAS programming. 

Ask a Question
Discussion stats
  • 6 replies
  • 249 views
  • 0 likes
  • 4 in conversation