Help using Base SAS procedures

Matching a record based on Date

Reply
Contributor
Posts: 24

Matching a record based on Date

Hi All,

I have customer data that contains date joined with site details. But the problem with the dataset is incorrect account data as the mathing process has picked incorrect data. Below is a table showing the dataset that contains incorrect data and what the data needs to look like;

Scenario: Customer buys a product but transfer it to someone else. The problem here is the sale has been allocated the wrong account number, the correct account number should be 100 and not 3000.

I need help with code that can match the UniqueProductID as the reference but to then look at the sale date and match it to the correct record in the second table. To put a little more detail, a "uniqueproductID" that is transferred will get an account number but it must be matched to the correct account number table. I'm guessing the logic will basically look at date ranges, so account no 3000 started the warranty in June and 100 started in February. The problem is there is no standard number of days after the product has been sold but we can identify the roughly which application was sold in relation to its first usage date.

AccountNoUniqueProductIDSaleDateSaleID
3000161200015/02/2011ABXD1
3000161200005/05/2011AIIIU2

AccountNoUniqueProductIDUsageDate
100161200020/02/2011
3000161200003/06/2011

I hope this is clear and unfortuantely there is no saleID against the table that contains all historical account numbers which is why the usage date must be linked into SaleDate (the range of the date in relation to the next one).

Thanks

Respected Advisor
Posts: 3,156

Re: Matching a record based on Date

Posted in reply to thegraduate

data h1;

input (AccountNo UniqueProductID) (:$10.) SaleDate ddmmyy10. SaleID:$10.;

format SaleDate ddmmyy10. ;

cards;

3000 1612000 15/02/2011 ABXD1

3000 1612000 05/05/2011 AIIIU2

;

data h2;

input (AccountNo UniqueProductID) (:$10.) usageDate ddmmyy10.;

format usageDate ddmmyy10.;

cards;

100 1612000 20/02/2011

3000 1612000 03/06/2011

;

/*if your real data lays just as what you have shown: sorted, 1:1 ratio within the same accountno, then:*/

data want;

merge h1 h2;

by uniqueproductid;

run;

/*otherwise, not as efficient, but the following should work*/

proc sql;

create table want as

select a.accountno, a.UniqueProductID, saledate, usagedate, saleid from

h2 a

left join h1 b

on a.UniqueProductID=b.UniqueProductID

group by a.accountno

having usagedate-saledate=min(abs(usagedate-saledate)) and

usagedate-saledate >=0;

quit;

proc print;run;

Haikuo

Contributor
Posts: 24

Re: Matching a record based on Date

Thanks for the reply Haikuo,

That didn't seem to work as it showed both account numbers for that sale. What you have to remember is, the account number has wrongly been allocated already. Is it possible to create a variable that says "correct" or "incorrect"?

Respected Advisor
Posts: 3,156

Re: Matching a record based on Date

Posted in reply to thegraduate

The best way to get the answer you want on the forum is to lay out 1) your inputting data, you already have this. 2) your output data. At this point, please provide an example of what you are exactly expecting for.

Haikuo

Super User
Posts: 10,035

Re: Matching a record based on Date

Posted in reply to thegraduate

But if there were multi observations in Second dataset between the date and  the next one at the First dataset,

What you are going to do?

Assuming You only want the closest date .

data h1;
input (AccountNo UniqueProductID) (:$10.) SaleDate ddmmyy10. SaleID:$10.;
format SaleDate ddmmyy10. ;
cards;
3000 1612000 15/02/2011 ABXD1
3000 1612000 05/05/2011 AIIIU2
;
run;
data h2;
input (AccountNo UniqueProductID) (:$10.) usageDate ddmmyy10.;
format usageDate ddmmyy10.;
cards;
100 1612000 20/02/2011
3000 1612000 03/06/2011
;
run;

proc sql;
 create table want as
  select b.AccountNo,a.UniqueProductID,a.SaleDate,a.SaleID
   from h1 as a,h2 as b
    where a.AccountNo=b.AccountNo 
     group by a.AccountNo,SaleDate 
      having usageDate-SaleDate=min(abs(usageDate-SaleDate));
quit;

Ksharp

Frequent Contributor
Posts: 101

Re: Matching a record based on Date

Posted in reply to thegraduate

If I understand you correctly, the account numbers in the first table listed in your post are incorrect. You need to match the uniqueproductid from the first table to the second table and return the accountno from the second table that has the closest usagedate after the saledate. I think this may be what you are looking for. I added another product to your sales data without a matching warranty record.

data sales;
input UniqueProductID :$10. SaleDate ddmmyy10. SaleID:$10.;
format SaleDate ddmmyy10. ;
cards;
1612000 15/02/2011 ABXD1
1612000 05/05/2011 AIIIU2
1613333 01/04/2012 XXXXX
;

data warranties;
input (AccountNo UniqueProductID) (:$10.) usageDate ddmmyy10.;
format usageDate ddmmyy10.;
cards;
100 1612000 20/02/2011
3000 1612000 03/06/2011
;

proc sql;
create table want as
select
   b.accountno,
   a.*
from
   sales  a
   left join
   warranties  b
   on  
      a.uniqueproductid = b.uniqueproductid and
      b.usagedate = (select min( usagedate )
                     from
                        warranties
                     where
                        usagedate > a.saledate )
;
quit;

Contributor
Posts: 24

Re: Matching a record based on Date

Posted in reply to thegraduate

Thank you all for helping me find the solution. To answer Hai, the following output is required and is already present in the initial post;

AccountNoUniqueproductIDSaleDate
3000161200015/02/2011
3000161200005/05/2011
87281400028806/09/2012
87281400028806/01/2013

AccountNoUniqueproductIDUsageDate
3000161200003/06/2011
100161200020/02/2011
87281400028801/10/2012

So, the output will show the correct account number for record one (100) and not 3000. I have also added one more scenario, if a sale is made on a product but the account has not been created, it should blank out the account no. So Essentially the sale must be matched to the correct account number

Frequent Contributor
Posts: 101

Re: Matching a record based on Date

Posted in reply to thegraduate

The same solution I offered earlier still works. Try it:

data sales;
input UniqueProductID :$10. SaleDate ddmmyy10. SaleID:$10.;
format SaleDate ddmmyy10. ;
cards;
1612000 15/02/2011 ABXD1
1612000 05/05/2011 AIIIU2
14000288 06/09/2012 XXXXX
14000288 06/01/2013 XXXXX
;

run;

data warranties;
input (AccountNo UniqueProductID) (:$10.) usageDate ddmmyy10.;
format usageDate ddmmyy10.;
cards;
100 1612000 20/02/2011
3000 1612000 03/06/2011
8728 14000288 01/10/2012
;

run;

proc sql;
create table want as
select
   b.accountno,
   a.*
from
   sales  a
   left join
   warranties  b
   on  
      a.uniqueproductid = b.uniqueproductid and
      b.usagedate = (select min( usagedate )
                     from
                        warranties
                     where
                        usagedate > a.saledate )
;
quit;

Contributor
Posts: 24

Re: Matching a record based on Date

Posted in reply to thegraduate

All,

I have tried the following code;

PROC SQL;

create table identifyAccNo AS

     select b.account_no, b.meter_number, b.invoice_date from

     work.MasterSupplyStartDate_ b

     left join bhyatm.ACQUISITIONMASTERACNO_20120326 A on a.meter_number=b.meter_number

     group by b.account_no

     having invoice_date-finaldatecombined=min(abs(invoice_date-finaldatecombined)) And

     invoice_date-finaldatecombined>=0;

quit;

So Currently an example of what is happening with the above code;

table A (sale data):

AccountNoMeter_Numberfinaldatecombined(sale date)SaleID
800005539381011/06/2011FX111

table B (supply data)

AccountNoMeter_NumberInvoice_Date
310005539381004/10/2011
390005539381024/10/2011

The code mentioned at the top is bringing back both records (i.e. it is duplicating the saleid with both account numbers), this is shown below. I only want it to bring back 31000 which is the closest account number to the invoice date.

AccountNoMeter_Numberfinaldatecombined(sale date)SaleID
310005539381011/06/2011FX111
390005539381011/06/2011FX111
Frequent Contributor
Posts: 101

Re: Matching a record based on Date

Posted in reply to thegraduate

Here's my third attempt to get you to try this. Trust me. It works. I modified it to use the data structure of your last post, but the concept is the same as the other 2 times I offered this solution.

data A;
infile cards;
input accountno meter_number finaldatecombined ddmmyy10. saleid $;
format finaldatecombined ddmmyy10.;
cards;
80000 55393810 11/06/2011 FX111
;
run;

data B;
infile cards;
input accountno meter_number invoice_date ddmmyy10.;
format invoice_date ddmmyy10.;
cards;
31000 55393810 04/10/2011
39000 55393810 24/10/2011
;
run;

proc sql;
create table identifyaccno as
select
   b.accountno,
   a.meter_number,
   a.finaldatecombined,
   b.invoice_date,
   a.saleid
from
   a
   left join
   b
   on  
      a.meter_number = b.meter_number and
      b.invoice_date = (select min( invoice_date )
                     from
                        b
                     where
                        b.invoice_date > a.finaldatecombined )
;
quit;

Contributor
Posts: 24

Re: Matching a record based on Date

Posted in reply to SAS_Bigot

Mr Bigot,

I have tried your code and it returned blank values for account_no and invoice_Date;

proc sql;

     create table identifyaccno as

           select

                b.account_no,

                a.meter_number,

                a.finaldatecombined,

                b.invoice_date,

                a.salesforce_id

           from

                bhyatm.ACQUISITIONMASTERACNO_20120326 a

           left join

                MasterSupplyStartDate_ b

                on  

                a.meter_number = b.meter_number and

                b.invoice_date = (select min(invoice_date)

           from

                MasterSupplyStartDate_ b

           where

                b.invoice_date > a.finaldatecombined)

     ;

quit;

Have I made a mistake?

Frequent Contributor
Posts: 101

Re: Matching a record based on Date

Posted in reply to thegraduate

In an earlier post, you mentioned that if a sale has been made but the account_no has not been created yet that the account_no should be blank. The SQL i wrote has an outer join (left join) so that if there are no matching account numbers they would be blank. For those records with a missing account_no, have you confirmed that the meter has a valid match on the b table?


Ask a Question
Discussion stats
  • 11 replies
  • 505 views
  • 0 likes
  • 4 in conversation