Solved: Re: Identify previous interactions between two firms

azsham · Posted 06-30-2022 04:38 PM

Hello,

I would like to calculate relationship measures between two firms. Unit of the data: transaction id. The data and desired variables look like:

Transactionid Date FirmA FirmB RelationshipDummy RelationshipCount

1 06/30/2000 11111 10004 . .

2 09/16/2000 11111 15266 0 0

3 09/30/2000 11111 10004 1 1

4 11/01/2000 11111 17789 0 0

5 01/30/2001 11111 10004 1 2

6 06/30/2002 11111 15266 1 1

7 02/04/2010 11111 10004 0 0

8 06/30/2000 14551 10004 .

9 09/11/2002 14551 10004 1 1

10 01/08/2003 14551 10004 1 2

11 02/22/2005 14551 10004 1 3

Relationship Dummy tells if Firm A and Firm B had a relationship in the previous five years or not.

Relationship Count tells how many times did Firm A and Firm B make transactions in the previous five years.

I really do not know how this could be coded. So, your help is much appreciated!

Thanks!

mkeintz · Posted 07-01-2022 09:14 AM

Your data are sorted by FIRMA/DATE, but your interaction history is based on FIRMA/FIRMB. This is a good situation to use a hash object keyed on FIRMA/FIRMB:


data have;
  input Transactionid         Date :mmddyy10.    FirmA :$5.    FirmB :$5.     expected_rd expected_rc;
  format date date9.;
datalines;
 1     06/30/2000      11111     10004               .                                           .
 2     09/16/2000      11111     15266               0                                          0
 3     09/30/2000      11111     10004               1                                          1
 4     11/01/2000      11111     17789               0                                          0
 5     01/30/2001      11111     10004               1                                          2
 6     06/30/2002      11111     15266               1                                          1
 7     02/04/2010      11111     10004               0                                          0
 8     06/30/2000      14551    10004               .                                          .
 9     09/11/2002      14551    10004               1                                          1
10     01/08/2003      14551    10004               1                                          2
11     02/22/2005      14551    10004               1                                          3
run; 

data want (drop=d _:);
  set have;
  by firma date;
  array _hist{20};  format _hist: date9. ;
  if _n_=1 then do;
    call missing(_dmin,_dmax);
    declare hash h ();
      h.definekey('firma','firmb');
      h.definedata('_dmin','_dmax');
      do d=1 to dim(_hist);
        h.definedata(cats('_hist',d));
      end;
      h.definedone();
  end;
  else if first.firma then h.clear();
  if h.find()^=0 then do;
     _dmin=1;
     _dmax=1;
  end;

  else do;
    _cutoff_date=intnx('year',date,-5,'same');
    do d=_dmin to _dmax while (_hist{d}<_cutoff_date);
      _hist{d}=.;
    end;
    _dmin=d;
    _dmax=_dmax+1; 
  end;
  _hist{_dmax}=date;
  h.replace();
  rcount=_dmax-_dmin;
  rdummy=(rcount)>0;
  if first.firma then call missing(rdummy,rcount);
run;

Make the size of array _HIST large enough to accommodate the longest expected history. I made it 20.

BTW, it's not clear to me why RCOUNT and RDUMMY should be 0's for the initial record of each FIRMA/FIRMB combination - EXCEPT for the earliest record of FIRMA, when you have RCOUNT and RDUMMY as missing. However, that feature is replicated above - i.e. my RCOUNT equals you original EXPECTED_RC, and RDUMMY=EXPECTED_RD.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

ballardw · Posted 06-30-2022 04:56 PM

Some questions to consider:

Does the firm that appears as a value in FirmB, such as 10004 ever also appear as a FirmA elsewhere? Or is the ORDER of the firm ids important to the process? In other words does a transaction with 11111 as FirmA and 10004 as FirmB have to be considered with 10004 as FirmA and 11111 as FirmB?

Since it appears you may want to consider things in order then the above question is needed to set in which order things are used.

Over what duration is the Relationship count calculated. You are showing at least 10 years of transaction dates. If a pair of firms have one transaction a year in each of 10 years what would you expect the Relationship count to show? Or once every three years? Your description does not clearly describe the interaction with the count and the "five year" requirement.

azsham · Posted 06-30-2022 05:54 PM

@ballardw wrote:
Some questions to consider:
Does the firm that appears as a value in FirmB, such as 10004 ever also appear as a FirmA elsewhere? Or is the ORDER of the firm ids important to the process? In other words does a transaction with 11111 as FirmA and 10004 as FirmB have to be considered with 10004 as FirmA and 11111 as FirmB?

Since it appears you may want to consider things in order then the above question is needed to set in which order things are used.

Over what duration is the Relationship count calculated. You are showing at least 10 years of transaction dates. If a pair of firms have one transaction a year in each of 10 years what would you expect the Relationship count to show? Or once every three years? Your description does not clearly describe the interaction with the count and the "five year" requirement.

Q1:

Does the firm that appears as a value in FirmB, such as 10004 ever also appear as a FirmA elsewhere?

Answer:

No. Firm ids do not appear in the other column. Firm A shows retail firms ids while Firm B column shows Suppliers. Thus, once a firm appeared in Firm A column, I know for sure it is not going to appear in column Firm B.

Q2:

If a pair of firms have one transaction a year in each of 10 years what would you expect the Relationship count to show?

Answer:

Here is an example that answers your question:

Transaction id Firm A Firm B Date Relationship count

1 188 1414 2001 .

2 188 1414 2002 1

3 188 1414 2003 2

4 188 1414 2004 3

5 188 1414 2005 4

6 188 1414 2006 5

7 188 1414 2007 5

8 188 1414 2008 5

9 188 1414 2009 5

10 188 1414 2010 5

Q3:

Or once every three years?

Answer: relationship count would show 1 interaction in the previous five years as the interaction between these two firms happens every 3 years. Here is an example:

Transaction id Firm A Firm B Date Relationship count

1 188 1414 2001 .

2 188 1414 2004 1

3 188 1414 2007 1

4 188 1414 2010 1

mkeintz · Posted 07-01-2022 09:14 AM

Your data are sorted by FIRMA/DATE, but your interaction history is based on FIRMA/FIRMB. This is a good situation to use a hash object keyed on FIRMA/FIRMB:


data have;
  input Transactionid         Date :mmddyy10.    FirmA :$5.    FirmB :$5.     expected_rd expected_rc;
  format date date9.;
datalines;
 1     06/30/2000      11111     10004               .                                           .
 2     09/16/2000      11111     15266               0                                          0
 3     09/30/2000      11111     10004               1                                          1
 4     11/01/2000      11111     17789               0                                          0
 5     01/30/2001      11111     10004               1                                          2
 6     06/30/2002      11111     15266               1                                          1
 7     02/04/2010      11111     10004               0                                          0
 8     06/30/2000      14551    10004               .                                          .
 9     09/11/2002      14551    10004               1                                          1
10     01/08/2003      14551    10004               1                                          2
11     02/22/2005      14551    10004               1                                          3
run; 

data want (drop=d _:);
  set have;
  by firma date;
  array _hist{20};  format _hist: date9. ;
  if _n_=1 then do;
    call missing(_dmin,_dmax);
    declare hash h ();
      h.definekey('firma','firmb');
      h.definedata('_dmin','_dmax');
      do d=1 to dim(_hist);
        h.definedata(cats('_hist',d));
      end;
      h.definedone();
  end;
  else if first.firma then h.clear();
  if h.find()^=0 then do;
     _dmin=1;
     _dmax=1;
  end;

  else do;
    _cutoff_date=intnx('year',date,-5,'same');
    do d=_dmin to _dmax while (_hist{d}<_cutoff_date);
      _hist{d}=.;
    end;
    _dmin=d;
    _dmax=_dmax+1; 
  end;
  _hist{_dmax}=date;
  h.replace();
  rcount=_dmax-_dmin;
  rdummy=(rcount)>0;
  if first.firma then call missing(rdummy,rcount);
run;

Make the size of array _HIST large enough to accommodate the longest expected history. I made it 20.

BTW, it's not clear to me why RCOUNT and RDUMMY should be 0's for the initial record of each FIRMA/FIRMB combination - EXCEPT for the earliest record of FIRMA, when you have RCOUNT and RDUMMY as missing. However, that feature is replicated above - i.e. my RCOUNT equals you original EXPECTED_RC, and RDUMMY=EXPECTED_RD.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

azsham · Posted 07-01-2022 05:36 PM

Wow! your code seems to do the job. Thank you very so much, @mkeintz !

I will have to double-check rcount and rdummy output, but they seem to be what I want!

To answer your question:

If firma appears for the first time then rdummy and rcount should be missing (assume firmb id here is 99999). However, if firma chooses another firmb in the next available observation in my data (e.g. firmb now is 12345), then this tells that they did not choose their previous relationship with 99999 (firmb) to by the product. This indicates that this second transaction is not a relationship transaction for firma.

I choose this argument so I do not lose much of the data as missings if I want all new combinations to be coded as missing. However, how would the code change if we were to assign missing for new combinations? I think it is important to confirm that the analysis results are not affected by the way I construct these two variables.

Thank you very much!

mkeintz · Posted 07-01-2022 05:50 PM

@azsham

I understand your reasoning about why only the first date over all firmb's for a given firma has missing rdummy and rcount.

But then, will your research be satisfied with only maintaining within-firmb values of rdummy and rcount? Might not you want to also generate rdummy_global and rcount_global for each firma accross all firmb's? It would be easy to do.

In additional to possible rdummy_global and rcount_global variables, you might also want to have a count of distinct firmb's in the last five years - i.e. how much has firma diversified its vendors over the last x years.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Identify previous interactions between two firms

Re: Identify previous interactions between two firms

Re: Identify previous interactions between two firms

Re: Identify previous interactions between two firms

Re: Identify previous interactions between two firms

Re: Identify previous interactions between two firms

Re: Identify previous interactions between two firms

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away