BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
azsham
Obsidian | Level 7

Hello,

I would like to calculate relationship measures between two firms. Unit of the data: transaction id. The data and desired variables look like:

 

Transactionid         Date           FirmA     FirmB     RelationshipDummy       RelationshipCount

          1             06/30/2000      11111     10004               .                                           .

          2             09/16/2000      11111     15266               0                                          0

          3             09/30/2000      11111     10004               1                                          1

          4             11/01/2000      11111     17789               0                                          0

          5             01/30/2001      11111     10004               1                                          2

          6             06/30/2002      11111     15266               1                                          1

          7             02/04/2010      11111     10004               0                                          0

 

          8             06/30/2000      14551    10004               .

          9             09/11/2002      14551    10004               1                                          1

          10           01/08/2003      14551    10004               1                                          2

          11           02/22/2005      14551    10004               1                                          3  

 

 

Relationship Dummy tells if Firm A and Firm B had a relationship in the previous five years or not.

Relationship Count tells how many times did Firm A and Firm B make transactions in the previous five years.

 

 

I really do not know how this could be coded. So, your help is much appreciated!

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Your data are sorted by FIRMA/DATE, but your interaction history is based on FIRMA/FIRMB.  This is a good situation to use a hash object keyed on FIRMA/FIRMB:

 


data have;
  input Transactionid         Date :mmddyy10.    FirmA :$5.    FirmB :$5.     expected_rd expected_rc;
  format date date9.;
datalines;
 1     06/30/2000      11111     10004               .                                           .
 2     09/16/2000      11111     15266               0                                          0
 3     09/30/2000      11111     10004               1                                          1
 4     11/01/2000      11111     17789               0                                          0
 5     01/30/2001      11111     10004               1                                          2
 6     06/30/2002      11111     15266               1                                          1
 7     02/04/2010      11111     10004               0                                          0
 8     06/30/2000      14551    10004               .                                          .
 9     09/11/2002      14551    10004               1                                          1
10     01/08/2003      14551    10004               1                                          2
11     02/22/2005      14551    10004               1                                          3
run; 

data want (drop=d _:);
  set have;
  by firma date;
  array _hist{20};  format _hist: date9. ;
  if _n_=1 then do;
    call missing(_dmin,_dmax);
    declare hash h ();
      h.definekey('firma','firmb');
      h.definedata('_dmin','_dmax');
      do d=1 to dim(_hist);
        h.definedata(cats('_hist',d));
      end;
      h.definedone();
  end;
  else if first.firma then h.clear();
  if h.find()^=0 then do;
     _dmin=1;
     _dmax=1;
  end;

  else do;
    _cutoff_date=intnx('year',date,-5,'same');
    do d=_dmin to _dmax while (_hist{d}<_cutoff_date);
      _hist{d}=.;
    end;
    _dmin=d;
    _dmax=_dmax+1; 
  end;
  _hist{_dmax}=date;
  h.replace();
  rcount=_dmax-_dmin;
  rdummy=(rcount)>0;
  if first.firma then call missing(rdummy,rcount);
run;

Make the size of array _HIST large enough to accommodate the longest expected history.  I made it 20.

 

BTW, it's not clear to me why RCOUNT and RDUMMY should be 0's for the initial record of each FIRMA/FIRMB combination - EXCEPT for the earliest record of FIRMA, when you have RCOUNT and RDUMMY as missing.  However, that feature is replicated above - i.e. my RCOUNT equals you original EXPECTED_RC, and RDUMMY=EXPECTED_RD.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

5 REPLIES 5
ballardw
Super User

Some questions to consider:

Does the firm that appears as a value in FirmB, such as 10004 ever also appear as a FirmA elsewhere? Or is the ORDER of the firm ids important to the process? In other words does a transaction with 11111 as FirmA and 10004 as FirmB have to be considered with 10004 as FirmA and 11111 as FirmB?

 

Since it appears you may want to consider things in order then the above question is needed to set in which order things are used.

 

 

Over what duration is the Relationship count calculated. You are showing at least 10 years of transaction dates. If a pair of firms have one transaction a year in each of 10 years what would you expect the Relationship count to show? Or once every three years? Your description does not clearly describe the interaction with the count and the "five year" requirement.

azsham
Obsidian | Level 7

@ballardw wrote:

Some questions to consider:

Does the firm that appears as a value in FirmB, such as 10004 ever also appear as a FirmA elsewhere? Or is the ORDER of the firm ids important to the process? In other words does a transaction with 11111 as FirmA and 10004 as FirmB have to be considered with 10004 as FirmA and 11111 as FirmB?

 

Since it appears you may want to consider things in order then the above question is needed to set in which order things are used.

 

 

Over what duration is the Relationship count calculated. You are showing at least 10 years of transaction dates. If a pair of firms have one transaction a year in each of 10 years what would you expect the Relationship count to show? Or once every three years? Your description does not clearly describe the interaction with the count and the "five year" requirement.


Q1:

Does the firm that appears as a value in FirmB, such as 10004 ever also appear as a FirmA elsewhere?

 

Answer:

No. Firm ids do not appear in the other column. Firm A shows retail firms ids while Firm B column shows Suppliers. Thus, once a firm appeared in Firm A column, I know for sure it is not going to appear in column Firm B.

 

Q2:

If a pair of firms have one transaction a year in each of 10 years what would you expect the Relationship count to show?

 

Answer:

Here is an example that answers your question:

 

Transaction id        Firm A         Firm B        Date       Relationship count

          1                    188             1414         2001                  .

          2                    188             1414         2002                  1

          3                    188             1414         2003                  2

          4                    188             1414         2004                  3

          5                    188             1414         2005                  4

          6                    188             1414         2006                  5

          7                    188             1414         2007                  5

          8                    188             1414         2008                  5

          9                    188             1414         2009                  5

          10                  188             1414         2010                  5

 

 

Q3:

Or once every three years? 

 

Answer: relationship count would show 1 interaction in the previous five years as the interaction between these two firms happens every 3 years. Here is an example:

 

Transaction id        Firm A         Firm B        Date       Relationship count

          1                    188             1414         2001                  .

          2                    188             1414         2004                  1

          3                    188             1414         2007                  1

          4                    188             1414         2010                  1

mkeintz
PROC Star

Your data are sorted by FIRMA/DATE, but your interaction history is based on FIRMA/FIRMB.  This is a good situation to use a hash object keyed on FIRMA/FIRMB:

 


data have;
  input Transactionid         Date :mmddyy10.    FirmA :$5.    FirmB :$5.     expected_rd expected_rc;
  format date date9.;
datalines;
 1     06/30/2000      11111     10004               .                                           .
 2     09/16/2000      11111     15266               0                                          0
 3     09/30/2000      11111     10004               1                                          1
 4     11/01/2000      11111     17789               0                                          0
 5     01/30/2001      11111     10004               1                                          2
 6     06/30/2002      11111     15266               1                                          1
 7     02/04/2010      11111     10004               0                                          0
 8     06/30/2000      14551    10004               .                                          .
 9     09/11/2002      14551    10004               1                                          1
10     01/08/2003      14551    10004               1                                          2
11     02/22/2005      14551    10004               1                                          3
run; 

data want (drop=d _:);
  set have;
  by firma date;
  array _hist{20};  format _hist: date9. ;
  if _n_=1 then do;
    call missing(_dmin,_dmax);
    declare hash h ();
      h.definekey('firma','firmb');
      h.definedata('_dmin','_dmax');
      do d=1 to dim(_hist);
        h.definedata(cats('_hist',d));
      end;
      h.definedone();
  end;
  else if first.firma then h.clear();
  if h.find()^=0 then do;
     _dmin=1;
     _dmax=1;
  end;

  else do;
    _cutoff_date=intnx('year',date,-5,'same');
    do d=_dmin to _dmax while (_hist{d}<_cutoff_date);
      _hist{d}=.;
    end;
    _dmin=d;
    _dmax=_dmax+1; 
  end;
  _hist{_dmax}=date;
  h.replace();
  rcount=_dmax-_dmin;
  rdummy=(rcount)>0;
  if first.firma then call missing(rdummy,rcount);
run;

Make the size of array _HIST large enough to accommodate the longest expected history.  I made it 20.

 

BTW, it's not clear to me why RCOUNT and RDUMMY should be 0's for the initial record of each FIRMA/FIRMB combination - EXCEPT for the earliest record of FIRMA, when you have RCOUNT and RDUMMY as missing.  However, that feature is replicated above - i.e. my RCOUNT equals you original EXPECTED_RC, and RDUMMY=EXPECTED_RD.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
azsham
Obsidian | Level 7

Wow! your code seems to do the job. Thank you very so much, @mkeintz !

I will have to double-check rcount and rdummy output, but they seem to be what I want!

 

 

To answer your question:

If firma appears for the first time then rdummy and rcount should be missing (assume firmb id here is 99999). However, if firma chooses another firmb in the next available observation in my data (e.g. firmb now is 12345), then this tells that they did not choose their previous relationship with 99999 (firmb) to by the product. This indicates that this second transaction is not a relationship transaction for firma.

 

 

I choose this argument so I do not lose much of the data as missings if I want all new combinations to be coded as missing. However, how would the code change if we were to assign missing for new combinations? I think it is important to confirm that the analysis results are not affected by the way I construct these two variables.

 

Thank you very much!

mkeintz
PROC Star

@azsham 

 

I understand your reasoning about why only the first date over all firmb's for a given firma has missing rdummy and rcount.  

 

But then, will your research be satisfied with only maintaining within-firmb values of rdummy and rcount?  Might not you want to also generate rdummy_global and rcount_global for each firma accross all firmb's?  It would be easy to do.  

 

In additional to possible rdummy_global and rcount_global variables, you might also want to have a count of distinct firmb's in the last five years - i.e. how much has firma diversified its vendors over the last x years.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 800 views
  • 2 likes
  • 3 in conversation