BookmarkSubscribeRSS Feed
thegraduate
Calcite | Level 5

Hi All,

I have customer data that contains date joined with site details. But the problem with the dataset is incorrect account data as the mathing process has picked incorrect data. Below is a table showing the dataset that contains incorrect data and what the data needs to look like;

Scenario: Customer buys a product but transfer it to someone else. The problem here is the sale has been allocated the wrong account number, the correct account number should be 100 and not 3000.

I need help with code that can match the UniqueProductID as the reference but to then look at the sale date and match it to the correct record in the second table. To put a little more detail, a "uniqueproductID" that is transferred will get an account number but it must be matched to the correct account number table. I'm guessing the logic will basically look at date ranges, so account no 3000 started the warranty in June and 100 started in February. The problem is there is no standard number of days after the product has been sold but we can identify the roughly which application was sold in relation to its first usage date.

AccountNoUniqueProductIDSaleDateSaleID
3000161200015/02/2011ABXD1
3000161200005/05/2011AIIIU2

AccountNoUniqueProductIDUsageDate
100161200020/02/2011
3000161200003/06/2011

I hope this is clear and unfortuantely there is no saleID against the table that contains all historical account numbers which is why the usage date must be linked into SaleDate (the range of the date in relation to the next one).

Thanks

11 REPLIES 11
Haikuo
Onyx | Level 15

data h1;

input (AccountNo UniqueProductID) (:$10.) SaleDate ddmmyy10. SaleID:$10.;

format SaleDate ddmmyy10. ;

cards;

3000 1612000 15/02/2011 ABXD1

3000 1612000 05/05/2011 AIIIU2

;

data h2;

input (AccountNo UniqueProductID) (:$10.) usageDate ddmmyy10.;

format usageDate ddmmyy10.;

cards;

100 1612000 20/02/2011

3000 1612000 03/06/2011

;

/*if your real data lays just as what you have shown: sorted, 1:1 ratio within the same accountno, then:*/

data want;

merge h1 h2;

by uniqueproductid;

run;

/*otherwise, not as efficient, but the following should work*/

proc sql;

create table want as

select a.accountno, a.UniqueProductID, saledate, usagedate, saleid from

h2 a

left join h1 b

on a.UniqueProductID=b.UniqueProductID

group by a.accountno

having usagedate-saledate=min(abs(usagedate-saledate)) and

usagedate-saledate >=0;

quit;

proc print;run;

Haikuo

thegraduate
Calcite | Level 5

Thanks for the reply Haikuo,

That didn't seem to work as it showed both account numbers for that sale. What you have to remember is, the account number has wrongly been allocated already. Is it possible to create a variable that says "correct" or "incorrect"?

Haikuo
Onyx | Level 15

The best way to get the answer you want on the forum is to lay out 1) your inputting data, you already have this. 2) your output data. At this point, please provide an example of what you are exactly expecting for.

Haikuo

Ksharp
Super User

But if there were multi observations in Second dataset between the date and  the next one at the First dataset,

What you are going to do?

Assuming You only want the closest date .

data h1;
input (AccountNo UniqueProductID) (:$10.) SaleDate ddmmyy10. SaleID:$10.;
format SaleDate ddmmyy10. ;
cards;
3000 1612000 15/02/2011 ABXD1
3000 1612000 05/05/2011 AIIIU2
;
run;
data h2;
input (AccountNo UniqueProductID) (:$10.) usageDate ddmmyy10.;
format usageDate ddmmyy10.;
cards;
100 1612000 20/02/2011
3000 1612000 03/06/2011
;
run;

proc sql;
 create table want as
  select b.AccountNo,a.UniqueProductID,a.SaleDate,a.SaleID
   from h1 as a,h2 as b
    where a.AccountNo=b.AccountNo 
     group by a.AccountNo,SaleDate 
      having usageDate-SaleDate=min(abs(usageDate-SaleDate));
quit;

Ksharp

FloydNevseta
Pyrite | Level 9

If I understand you correctly, the account numbers in the first table listed in your post are incorrect. You need to match the uniqueproductid from the first table to the second table and return the accountno from the second table that has the closest usagedate after the saledate. I think this may be what you are looking for. I added another product to your sales data without a matching warranty record.

data sales;
input UniqueProductID :$10. SaleDate ddmmyy10. SaleID:$10.;
format SaleDate ddmmyy10. ;
cards;
1612000 15/02/2011 ABXD1
1612000 05/05/2011 AIIIU2
1613333 01/04/2012 XXXXX
;

data warranties;
input (AccountNo UniqueProductID) (:$10.) usageDate ddmmyy10.;
format usageDate ddmmyy10.;
cards;
100 1612000 20/02/2011
3000 1612000 03/06/2011
;

proc sql;
create table want as
select
   b.accountno,
   a.*
from
   sales  a
   left join
   warranties  b
   on  
      a.uniqueproductid = b.uniqueproductid and
      b.usagedate = (select min( usagedate )
                     from
                        warranties
                     where
                        usagedate > a.saledate )
;
quit;

thegraduate
Calcite | Level 5

Thank you all for helping me find the solution. To answer Hai, the following output is required and is already present in the initial post;

AccountNoUniqueproductIDSaleDate
3000161200015/02/2011
3000161200005/05/2011
87281400028806/09/2012
87281400028806/01/2013

AccountNoUniqueproductIDUsageDate
3000161200003/06/2011
100161200020/02/2011
87281400028801/10/2012

So, the output will show the correct account number for record one (100) and not 3000. I have also added one more scenario, if a sale is made on a product but the account has not been created, it should blank out the account no. So Essentially the sale must be matched to the correct account number

FloydNevseta
Pyrite | Level 9

The same solution I offered earlier still works. Try it:

data sales;
input UniqueProductID :$10. SaleDate ddmmyy10. SaleID:$10.;
format SaleDate ddmmyy10. ;
cards;
1612000 15/02/2011 ABXD1
1612000 05/05/2011 AIIIU2
14000288 06/09/2012 XXXXX
14000288 06/01/2013 XXXXX
;

run;

data warranties;
input (AccountNo UniqueProductID) (:$10.) usageDate ddmmyy10.;
format usageDate ddmmyy10.;
cards;
100 1612000 20/02/2011
3000 1612000 03/06/2011
8728 14000288 01/10/2012
;

run;

proc sql;
create table want as
select
   b.accountno,
   a.*
from
   sales  a
   left join
   warranties  b
   on  
      a.uniqueproductid = b.uniqueproductid and
      b.usagedate = (select min( usagedate )
                     from
                        warranties
                     where
                        usagedate > a.saledate )
;
quit;

thegraduate
Calcite | Level 5

All,

I have tried the following code;

PROC SQL;

create table identifyAccNo AS

     select b.account_no, b.meter_number, b.invoice_date from

     work.MasterSupplyStartDate_ b

     left join bhyatm.ACQUISITIONMASTERACNO_20120326 A on a.meter_number=b.meter_number

     group by b.account_no

     having invoice_date-finaldatecombined=min(abs(invoice_date-finaldatecombined)) And

     invoice_date-finaldatecombined>=0;

quit;

So Currently an example of what is happening with the above code;

table A (sale data):

AccountNoMeter_Numberfinaldatecombined(sale date)SaleID
800005539381011/06/2011FX111

table B (supply data)

AccountNoMeter_NumberInvoice_Date
310005539381004/10/2011
390005539381024/10/2011

The code mentioned at the top is bringing back both records (i.e. it is duplicating the saleid with both account numbers), this is shown below. I only want it to bring back 31000 which is the closest account number to the invoice date.

AccountNoMeter_Numberfinaldatecombined(sale date)SaleID
310005539381011/06/2011FX111
390005539381011/06/2011FX111
FloydNevseta
Pyrite | Level 9

Here's my third attempt to get you to try this. Trust me. It works. I modified it to use the data structure of your last post, but the concept is the same as the other 2 times I offered this solution.

data A;
infile cards;
input accountno meter_number finaldatecombined ddmmyy10. saleid $;
format finaldatecombined ddmmyy10.;
cards;
80000 55393810 11/06/2011 FX111
;
run;

data B;
infile cards;
input accountno meter_number invoice_date ddmmyy10.;
format invoice_date ddmmyy10.;
cards;
31000 55393810 04/10/2011
39000 55393810 24/10/2011
;
run;

proc sql;
create table identifyaccno as
select
   b.accountno,
   a.meter_number,
   a.finaldatecombined,
   b.invoice_date,
   a.saleid
from
   a
   left join
   b
   on  
      a.meter_number = b.meter_number and
      b.invoice_date = (select min( invoice_date )
                     from
                        b
                     where
                        b.invoice_date > a.finaldatecombined )
;
quit;

thegraduate
Calcite | Level 5

Mr Bigot,

I have tried your code and it returned blank values for account_no and invoice_Date;

proc sql;

     create table identifyaccno as

           select

                b.account_no,

                a.meter_number,

                a.finaldatecombined,

                b.invoice_date,

                a.salesforce_id

           from

                bhyatm.ACQUISITIONMASTERACNO_20120326 a

           left join

                MasterSupplyStartDate_ b

                on  

                a.meter_number = b.meter_number and

                b.invoice_date = (select min(invoice_date)

           from

                MasterSupplyStartDate_ b

           where

                b.invoice_date > a.finaldatecombined)

     ;

quit;

Have I made a mistake?

FloydNevseta
Pyrite | Level 9

In an earlier post, you mentioned that if a sale has been made but the account_no has not been created yet that the account_no should be blank. The SQL i wrote has an outer join (left join) so that if there are no matching account numbers they would be blank. For those records with a missing account_no, have you confirmed that the meter has a valid match on the b table?


sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1461 views
  • 0 likes
  • 4 in conversation