I have the below dataset. Acctnbr is the customer current number. Vintage_id is a derived field pulling from several different possible cross reference account ids accross different systems. I want to get a common 'cust_id' from this table. I tried to do min or max(vintage_id) group by acctnbr but it is not consistent. In this example I would want the vintage_id ending in 2532 to be the customer id for each acctnbr because it is the only one that goes with all three acctnbr's.
Have:
| ACCTNBR | VINTAGE_ID |
| 4311963130842532 | 4311963130842532 |
| 4311963138028183 | 4311963130842532 |
| 4311963138028183 | 4311963138028183 |
| 4311963138028183 | 4311963139332832 |
| 4311963139332832 | 4311963130842532 |
| 4311963139332832 | 4311963138028183 |
| 4311963139332832 | 4311963139332832 |
Want:
| ACCTNBR | cust_id |
| 4311963130842532 | 4311963130842532 |
| 4311963138028183 | 4311963130842532 |
| 4311963139332832 | 4311963130842532 |
If there are multiple vintage_id's that match each acctnbr it doesn't matter which one is selected. I suppose I would prefer a min to stay consistent but it is arbitrary.
Any thoughts or suggestions on how to do this?
edit: This comes from a very large file with 100 million+ of acctnbrs, the number of acctnbrs and vintage_id's per acctnbr will not be consistent
@Steelers_In_DC wrote:
If there are multiple vintage_id's that match each acctnbr it doesn't matter which one is selected. I suppose I would prefer a min to stay consistent but it is arbitrary.
By "min" do you mean the minimum numeric value of the vintage_id or the minimum count?
When I wrote it I was thinking min numeric value, but if min/max count is more efficient I would just as soon go with that.
I am trying this on my own and ran into another situation. I apologize for not having thought of this before. If there are two options for cust_id I would need to select the one with the min(opendate) to get the earliest record.
| ACCTNBR | VINTAGE_ID | OPENDATE |
| 4311963130842532 | 4311963130842532 | 07Jun2013 |
| 4311963138028183 | 4311963130842532 | 25Nov2015 |
| 4311963138028183 | 4311963138028183 | 25Nov2015 |
| 4311963138028183 | 4311963139332832 | 25Nov2015 |
| 4311963139332832 | 4311963130842532 | 07Jun2013 |
| 4311963139332832 | 4311963138028183 | 07Jun2013 |
| 4311963139332832 | 4311963139332832 | 07Jun2013 |
The solution I found:
proc sql;
create table vintage_id as
select distinct a.acctnbr,a.vintage_id,b.opendate
from start a left join
start b on
a.vintage_id = b.acctnbr
order by acctnbr,opendate;
/*Transpose step will pull the most recent opendate into col1, this will be used for cust_id*/
proc transpose data=vintage_id out=tran_vintage(rename=(col1=cust_id)drop=_NAME_ );by acctnbr;var vintage_id;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.