Re: selecting common id

Steelers_In_DC · Posted 01-11-2017 10:08 AM

I have the below dataset. Acctnbr is the customer current number. Vintage_id is a derived field pulling from several different possible cross reference account ids accross different systems. I want to get a common 'cust_id' from this table. I tried to do min or max(vintage_id) group by acctnbr but it is not consistent. In this example I would want the vintage_id ending in 2532 to be the customer id for each acctnbr because it is the only one that goes with all three acctnbr's.

Have:

ACCTNBR	VINTAGE_ID
4311963130842532	4311963130842532
4311963138028183	4311963130842532
4311963138028183	4311963138028183
4311963138028183	4311963139332832
4311963139332832	4311963130842532
4311963139332832	4311963138028183
4311963139332832	4311963139332832

Want:

ACCTNBR	cust_id
4311963130842532	4311963130842532
4311963138028183	4311963130842532
4311963139332832	4311963130842532

If there are multiple vintage_id's that match each acctnbr it doesn't matter which one is selected. I suppose I would prefer a min to stay consistent but it is arbitrary.

Any thoughts or suggestions on how to do this?

edit: This comes from a very large file with 100 million+ of acctnbrs, the number of acctnbrs and vintage_id's per acctnbr will not be consistent

ballardw · Posted 01-11-2017 10:13 AM

@Steelers_In_DC wrote:

If there are multiple vintage_id's that match each acctnbr it doesn't matter which one is selected. I suppose I would prefer a min to stay consistent but it is arbitrary.

By "min" do you mean the minimum numeric value of the vintage_id or the minimum count?

Steelers_In_DC · Posted 01-11-2017 10:17 AM

When I wrote it I was thinking min numeric value, but if min/max count is more efficient I would just as soon go with that.

Steelers_In_DC · Posted 01-11-2017 10:37 AM

I am trying this on my own and ran into another situation. I apologize for not having thought of this before. If there are two options for cust_id I would need to select the one with the min(opendate) to get the earliest record.

ACCTNBR	VINTAGE_ID	OPENDATE
4311963130842532	4311963130842532	07Jun2013
4311963138028183	4311963130842532	25Nov2015
4311963138028183	4311963138028183	25Nov2015
4311963138028183	4311963139332832	25Nov2015
4311963139332832	4311963130842532	07Jun2013
4311963139332832	4311963138028183	07Jun2013
4311963139332832	4311963139332832	07Jun2013

Steelers_In_DC · Posted 01-11-2017 01:22 PM

The solution I found:

proc sql;
create table vintage_id as
select distinct a.acctnbr,a.vintage_id,b.opendate
from start a left join
start b on
a.vintage_id = b.acctnbr
order by acctnbr,opendate;

/*Transpose step will pull the most recent opendate into col1, this will be used for cust_id*/

proc transpose data=vintage_id out=tran_vintage(rename=(col1=cust_id)drop=_NAME_ );by acctnbr;var vintage_id;

selecting common id