BookmarkSubscribeRSS Feed
Steelers_In_DC
Barite | Level 11

I have the below dataset.  Acctnbr is the customer current number.  Vintage_id is a derived field pulling from several different possible cross reference account ids accross different systems.  I want to get a common 'cust_id' from this table.  I tried to do min or max(vintage_id) group by acctnbr but it is not consistent.  In this example I would want the vintage_id ending in 2532 to be the customer id for each acctnbr because it is the only one that goes with all three acctnbr's. 

 

Have:

ACCTNBR VINTAGE_ID
4311963130842532 4311963130842532
4311963138028183 4311963130842532
4311963138028183 4311963138028183
4311963138028183 4311963139332832
4311963139332832 4311963130842532
4311963139332832 4311963138028183
4311963139332832 4311963139332832

 

Want:

 

ACCTNBR cust_id
4311963130842532 4311963130842532
4311963138028183 4311963130842532
4311963139332832 4311963130842532

 

If there are multiple vintage_id's that match each acctnbr it doesn't matter which one is selected.  I suppose I would prefer a min to stay consistent but it is arbitrary. 

 

Any thoughts or suggestions on how to do this?

 

edit:  This comes from a very large file with 100 million+ of acctnbrs, the number of acctnbrs and vintage_id's per acctnbr will not be consistent

4 REPLIES 4
ballardw
Super User

@Steelers_In_DC wrote:

If there are multiple vintage_id's that match each acctnbr it doesn't matter which one is selected.  I suppose I would prefer a min to stay consistent but it is arbitrary. 

 

 


By "min" do you mean the minimum numeric value of the vintage_id or the minimum count?

Steelers_In_DC
Barite | Level 11

When I wrote it I was thinking min numeric value, but if min/max count is more efficient I would just as soon go with that.

Steelers_In_DC
Barite | Level 11

I am trying this on my own and ran into another situation.  I apologize for not having thought of this before.  If there are two options for cust_id I would need to select the one with the min(opendate) to get the earliest record. 

 

ACCTNBR VINTAGE_ID OPENDATE
4311963130842532 4311963130842532 07Jun2013
4311963138028183 4311963130842532 25Nov2015
4311963138028183 4311963138028183 25Nov2015
4311963138028183 4311963139332832 25Nov2015
4311963139332832 4311963130842532 07Jun2013
4311963139332832 4311963138028183 07Jun2013
4311963139332832 4311963139332832 07Jun2013
Steelers_In_DC
Barite | Level 11

The solution I found:

 

proc sql;
create table vintage_id as
select distinct a.acctnbr,a.vintage_id,b.opendate
from start a left join
     start b on
a.vintage_id = b.acctnbr
order by acctnbr,opendate;

/*Transpose step will pull the most recent opendate into col1, this will be used for cust_id*/

proc transpose data=vintage_id out=tran_vintage(rename=(col1=cust_id)drop=_NAME_ );by acctnbr;var vintage_id;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 763 views
  • 0 likes
  • 2 in conversation