DATA Step, Macro, Functions and more

selecting common id

Reply
Valued Guide
Posts: 858

selecting common id

[ Edited ]

I have the below dataset.  Acctnbr is the customer current number.  Vintage_id is a derived field pulling from several different possible cross reference account ids accross different systems.  I want to get a common 'cust_id' from this table.  I tried to do min or max(vintage_id) group by acctnbr but it is not consistent.  In this example I would want the vintage_id ending in 2532 to be the customer id for each acctnbr because it is the only one that goes with all three acctnbr's. 

 

Have:

ACCTNBR VINTAGE_ID
4311963130842532 4311963130842532
4311963138028183 4311963130842532
4311963138028183 4311963138028183
4311963138028183 4311963139332832
4311963139332832 4311963130842532
4311963139332832 4311963138028183
4311963139332832 4311963139332832

 

Want:

 

ACCTNBR cust_id
4311963130842532 4311963130842532
4311963138028183 4311963130842532
4311963139332832 4311963130842532

 

If there are multiple vintage_id's that match each acctnbr it doesn't matter which one is selected.  I suppose I would prefer a min to stay consistent but it is arbitrary. 

 

Any thoughts or suggestions on how to do this?

 

edit:  This comes from a very large file with 100 million+ of acctnbrs, the number of acctnbrs and vintage_id's per acctnbr will not be consistent

Super User
Posts: 10,538

Re: selecting common id


Steelers_In_DC wrote:

If there are multiple vintage_id's that match each acctnbr it doesn't matter which one is selected.  I suppose I would prefer a min to stay consistent but it is arbitrary. 

 

 


By "min" do you mean the minimum numeric value of the vintage_id or the minimum count?

Valued Guide
Posts: 858

Re: selecting common id

When I wrote it I was thinking min numeric value, but if min/max count is more efficient I would just as soon go with that.

Valued Guide
Posts: 858

Re: selecting common id

[ Edited ]

I am trying this on my own and ran into another situation.  I apologize for not having thought of this before.  If there are two options for cust_id I would need to select the one with the min(opendate) to get the earliest record. 

 

ACCTNBR VINTAGE_ID OPENDATE
4311963130842532 4311963130842532 07Jun2013
4311963138028183 4311963130842532 25Nov2015
4311963138028183 4311963138028183 25Nov2015
4311963138028183 4311963139332832 25Nov2015
4311963139332832 4311963130842532 07Jun2013
4311963139332832 4311963138028183 07Jun2013
4311963139332832 4311963139332832 07Jun2013
Valued Guide
Posts: 858

Re: selecting common id

The solution I found:

 

proc sql;
create table vintage_id as
select distinct a.acctnbr,a.vintage_id,b.opendate
from start a left join
     start b on
a.vintage_id = b.acctnbr
order by acctnbr,opendate;

/*Transpose step will pull the most recent opendate into col1, this will be used for cust_id*/

proc transpose data=vintage_id out=tran_vintage(rename=(col1=cust_id)drop=_NAME_ );by acctnbr;var vintage_id;

Ask a Question
Discussion stats
  • 4 replies
  • 118 views
  • 0 likes
  • 2 in conversation