BookmarkSubscribeRSS Feed
Steelers_In_DC
Barite | Level 11

I have the below dataset.  Acctnbr is the customer current number.  Vintage_id is a derived field pulling from several different possible cross reference account ids accross different systems.  I want to get a common 'cust_id' from this table.  I tried to do min or max(vintage_id) group by acctnbr but it is not consistent.  In this example I would want the vintage_id ending in 2532 to be the customer id for each acctnbr because it is the only one that goes with all three acctnbr's. 

 

Have:

ACCTNBR VINTAGE_ID
4311963130842532 4311963130842532
4311963138028183 4311963130842532
4311963138028183 4311963138028183
4311963138028183 4311963139332832
4311963139332832 4311963130842532
4311963139332832 4311963138028183
4311963139332832 4311963139332832

 

Want:

 

ACCTNBR cust_id
4311963130842532 4311963130842532
4311963138028183 4311963130842532
4311963139332832 4311963130842532

 

If there are multiple vintage_id's that match each acctnbr it doesn't matter which one is selected.  I suppose I would prefer a min to stay consistent but it is arbitrary. 

 

Any thoughts or suggestions on how to do this?

 

edit:  This comes from a very large file with 100 million+ of acctnbrs, the number of acctnbrs and vintage_id's per acctnbr will not be consistent

4 REPLIES 4
ballardw
Super User

@Steelers_In_DC wrote:

If there are multiple vintage_id's that match each acctnbr it doesn't matter which one is selected.  I suppose I would prefer a min to stay consistent but it is arbitrary. 

 

 


By "min" do you mean the minimum numeric value of the vintage_id or the minimum count?

Steelers_In_DC
Barite | Level 11

When I wrote it I was thinking min numeric value, but if min/max count is more efficient I would just as soon go with that.

Steelers_In_DC
Barite | Level 11

I am trying this on my own and ran into another situation.  I apologize for not having thought of this before.  If there are two options for cust_id I would need to select the one with the min(opendate) to get the earliest record. 

 

ACCTNBR VINTAGE_ID OPENDATE
4311963130842532 4311963130842532 07Jun2013
4311963138028183 4311963130842532 25Nov2015
4311963138028183 4311963138028183 25Nov2015
4311963138028183 4311963139332832 25Nov2015
4311963139332832 4311963130842532 07Jun2013
4311963139332832 4311963138028183 07Jun2013
4311963139332832 4311963139332832 07Jun2013
Steelers_In_DC
Barite | Level 11

The solution I found:

 

proc sql;
create table vintage_id as
select distinct a.acctnbr,a.vintage_id,b.opendate
from start a left join
     start b on
a.vintage_id = b.acctnbr
order by acctnbr,opendate;

/*Transpose step will pull the most recent opendate into col1, this will be used for cust_id*/

proc transpose data=vintage_id out=tran_vintage(rename=(col1=cust_id)drop=_NAME_ );by acctnbr;var vintage_id;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1408 views
  • 0 likes
  • 2 in conversation