SAS Programming

Pabster · Posted 02-25-2019 04:00 PM

Hi,

After managing to almost finish my code I realized I forgot to add a variable.
What I am currently trying to do is add a command that will chose the closest matching variable do another one.
Right now my code does the following : Matches an inactive company (with dlrsn= 02 or 03) to an active company from the same Industry (Sic Code) with the closest Fyear to the deletion date (DLDTE) of the Inactive company)

As you can see in the picture of my current data set there is a value COSTAT (which tells me when a company is active(0) or inactive(1)) and a value AT (total assets).
I need to make a pair sample. So when a company is inactive (COSTAT=1) I need sas to be able to chose an Active (costat=0) company with the closest AT to the Inactive company

PaigeMiller · Posted 02-25-2019 04:29 PM

So, I'm not sure I am understanding the problem. It seems as if the first paragraph is not needed at all to perform the task desired in the second paragraph. Is that right?

Nevertheless, the second paragraph can be accomplished as follows:

/* UNTESTED CODE */

proc sql;
    create table want as select h.gvkey as matched company,h.at as matched_at,
        g.at as at_to_match 
    from have(where=(costat=0)) as h,have(where=(costat=1)) as g
    group by g.at
    having min(abs(h.at-g.at));
quit;

--
Paige Miller

Pabster · Posted 02-25-2019 04:40 PM

Yeah the first paragraph was mostly explanatory just in case.
But the code didn't run.
Is this how I was suppose to write it down?

proc sql;
create table want as select h.gvkey as matched company,h.at as matched_at,
g.at as at_to_match
from Altman.Size(where=(costat=0)) as h,Altman.size(where=(costat=1)) as g
group by g.at
having min(abs(h.at-g.at));
quit;

My log screen tells me the following:

1656 proc sql;
1657 create table want as select h.gvkey as matched company,h.at as matched_at,
-------
22
ERROR 22-322: Syntax error, expecting one of the following: ',', AS, FROM.

1658 g.at as at_to_match
1659 from Altman.Size(where=(costat=0)) as h,Altman.size(where=(costat=1)) as g
----
22
76
ERROR 22-322: Syntax error, expecting one of the following: ;, ',', ANSIMISS, CROSS, EXCEPT,
FULL, GROUP, HAVING, INNER, INTERSECT, JOIN, LEFT, NATURAL, NOMISS, ORDER, OUTER,
RIGHT, UNION, WHERE.

ERROR 76-322: Syntax error, statement will be ignored.

1660 group by g.at
1661 having min(abs(h.at-g.at));
1662 quit;

PaigeMiller · Posted 02-25-2019 05:54 PM

Click on the {i} icon, and paste the log into the window that appears. Do not paste the log into your message without doing this.

--
Paige Miller

Pabster · Posted 02-26-2019 09:14 AM

1716  proc sql;
1717      create table Altman.ATSize as
1718      select h.gvkey as matched company,
                                    -------
                                    22
ERROR 22-322: Syntax error, expecting one of the following: ',', AS, FROM.

1719             h.at as matched_at,
1720             g.at as at_to_match,
1721      from Altman.Size
                     -
                     22
                     76
ERROR 22-322: Syntax error, expecting one of the following: ;, ',', ANSIMISS, CROSS, EXCEPT,
              FULL, GROUP, HAVING, INNER, INTERSECT, JOIN, LEFT, NATURAL, NOMISS, ORDER, OUTER,
              RIGHT, UNION, WHERE.

ERROR 76-322: Syntax error, statement will be ignored.

1722      where=(costat=0) as h and where=(costat=1) as g
1723      group by g.at
1724      having min(abs(h.at-g.at));
1725  quit;

does this help?

Pabster · Posted 02-26-2019 01:02 PM

*UPDATE*

This is the closest I have managed to make your code work with my data:

The code:

proc sql;
    create table CALICE as 
		select h.GVKEY as matched_company, h.AT as matched_at,
        g.AT as at_to_match 
    from Altman.size as h, Altman.size as g
	where h.COSTAT eq '1' and g.COSTAT eq '0'
	group by g.AT
    having min(abs(h.AT-g.AT));
quit;

I have obtained the following table:

To be honest I have no idea as how I am going to obtain my final table from this , but at least it works and I can go on !

Reeza · Posted 02-25-2019 04:30 PM

Take a look at PROC PSMATCH perhaps?

Pabster · Posted 02-25-2019 05:05 PM

Yeah I tried but I'm not very advanced so I don't really understand the logic on how I can apply it to my program...

mkeintz · Posted 02-25-2019 09:00 PM

I suspect the SQL approach will do a Cartesian crossing of all inactive AT vs active AT values to find minimum distance - a very expensive approach.

I suggest loading all active company AT values into a hash object ACTIVES, ordered by MATCHED_AT. The hash object can be used as a very fast lookup table.

Then, while reading the inactive companies one by one:

See if the inactive company AT is exactly matched in ACTIVES hash. If so, skip to major step below.
If not (rc^=0) then
1. Insert a dummy data item for the inactive company AT into the ordered hash of active company AT values. Because it's an ordered hash, it will be between 2 active company AT values.
2. Point the iterator at the inserted dataitem.
3. Move the iterator up by one data item (i.e. retrieve the next higher active-company AT).
4. Copy the higher AT and its gvkey to temporary variables.
5. Re-point the iterator to inactive company AT value.
6. Move the iterator down by one data item
7. Compare both to inactive AT and choose the closest.
8. Remove the inserted inactive company
Remove the matched AT dataitem - this is needed if you want to guarantee that no active company is matched twice.

The above sequence is almost certain to be much faster than the SQL approach. Here's the program.

data want (drop=_:);
  if _n_=1 then do;
    /*Make Lookup Table of active companies, with AT as lookup key */
    if 0 then
      set have (keep=costat at gvkey rename=(at=matched_at gvkey=matched_gvkey));
    declare hash
      actives(ordered:'A'
             ,dataset:'have(where=(costat=1 and not missing(at))
                            keep=costat at gvkey
                            rename=(at=matched_at gvkey=matched_gvkey))');
      actives.definekey('matched_at');
      actives.definedata('matched_at','matched_gvkey');
      actives.definedone();
    declare hiter hi ('actives');
  end;

  /*Read inactive companies, one at a time*/
  set have (where=(costat=0));

  /*Use inactive AT as a lookup key */
  _rc=actives.find(key:AT);

  /*If not found, then check higher and lower neighbors to find nearest*/
  if _rc^=0 then do;
    /* Insert inactive AT into the ordered hash of active ATs*/
    actives.add(key:AT,data:.,data:.);

    /* Point iterator at the inserted value */
    hi.setcur(key:AT);

    /* Iterate to next higher neighbor and save values */
    _rc=hi.next();
    _at_next=matched_at;
    _gvkey_next=matched_gvkey;

    /* Re-point iterator and iterate down one neighbor */
    hi.setcur(key:AT);
    _rc=hi.prev();

    /* Choose closest of the two */
    if (_at_next-at) < (at-matched_at) then do;
      matched_at=_at_next;
      matched_gvkey=_gvkey_next;
    end;

    _rc=hi.prev();
    _rc=actives.remove(key:at);
  end;
  _rc=actives.remove(key:matched_at);
run;

Notice the hash object ACTIVES is "instantiated" inside the "if _N_=1 then do" code group. The automatic variable _N_ counts the number of iterations in the data step (i.e. _N_ goes from 1 to the number of companies read by the "set have (where=(costat=0))" statement). Because the hash object persists across observations, it should be instantiated only once - namely in the first data step iteration. Hence the "if _N_=1 the ..." group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

PaigeMiller · Posted 02-26-2019 07:54 AM

@mkeintz wrote:

I suspect the SQL approach will do a Cartesian crossing of all inactive AT vs active AT values to find minimum distance - a very expensive approach.

A trade off between execution time and programming time (and knowledge), l which also can be expensive. I can't program a hash solution. I don't know how long this will run with the original poster's data, but I have run it on what I consider to be moderate size data sets (hundreds of observations) and I don't have a problem with execution time.

--
Paige Miller

Pabster · Posted 02-26-2019 09:11 AM

as Paige said, this might this be a little less time consuming when running moderate data sets but mine goes for 100s of thousands of lines so it might be a little to complicated to actually make it work

Pabster · Posted 02-26-2019 12:19 PM

I gave it a try with the following code:

data want (drop=_:);
  if _n_=1 then do;
    /*Make Lookup Table of active companies, with AT as lookup key */
    if 0 then
      set Altman.Size (keep=costat at gvkey rename=(at=matched_at gvkey=matched_gvkey));
    declare hash
      actives(ordered:'A'
             ,dataset:'have(where=(costat=1 and not missing(at))
                            keep=costat at gvkey
                            rename=(at=matched_at gvkey=matched_gvkey))');
      actives.definekey('matched_at');
      actives.definedata('matched_at','matched_gvkey');
      actives.definedone();
    declare hiter hi ('actives');
  end;

  /*Read inactive companies, one at a time*/
  set Altman.Size (where=(COSTAT=1));

  /*Use inactive AT as a lookup key */
  _rc=actives.find(key:AT);

  /*If not found, then check higher and lower neighbors to find nearest*/
  if _rc^=0 then do;
    /* Insert inactive AT into the ordered hash of active ATs*/
    actives.add(key:AT,data:.,data:.);

    /* Point iterator at the inserted value */
    hi.setcur(key:AT);

    /* Iterate to next higher neighbor and save values */
    _rc=hi.next();
    _at_next=matched_at;
    _gvkey_next=matched_gvkey;

    /* Re-point iterator and iterate down one neighbor */
    hi.setcur(key:AT);
    _rc=hi.prev();

    /* Choose closest of the two */
    if (_at_next-at) < (at-matched_at) then do;
      matched_at=_at_next;
      matched_gvkey=_gvkey_next;
    end;

    _rc=hi.prev();
    _rc=actives.remove(key:at);
  end;
  _rc=actives.remove(key:matched_at);
run;

but my log tells me this:

2734  data want (drop=_:);
2735    if _n_=1 then do;
2736      /*Make Lookup Table of active companies, with AT as lookup key */
2737      if 0 then
2738        set Altman.Size (keep=costat at gvkey rename=(at=matched_at gvkey=matched_gvkey));
2739      declare hash
2740        actives(ordered:'A'
2741               ,dataset:'have(where=(costat=1 and not missing(at))
2742                              keep=costat at gvkey
2743                              rename=(at=matched_at gvkey=matched_gvkey))');
2744        actives.definekey('matched_at');
2745        actives.definedata('matched_at','matched_gvkey');
2746        actives.definedone();
2747      declare hiter hi ('actives');
2748    end;
2749
2750    /*Read inactive companies, one at a time*/
2751    set Altman.Size (where=(COSTAT=1));
ERROR: WHERE clause operator requires compatible variables.
2752
2753    /*Use inactive AT as a lookup key */
2754    _rc=actives.find(key:AT);
2755
2756    /*If not found, then check higher and lower neighbors to find nearest*/
2757    if _rc^=0 then do;
2758      /* Insert inactive AT into the ordered hash of active ATs*/
2759      actives.add(key:AT,data:.,data:.);
2760
2761      /* Point iterator at the inserted value */
2762      hi.setcur(key:AT);
2763
2764      /* Iterate to next higher neighbor and save values */
2765      _rc=hi.next();
2766      _at_next=matched_at;
2767      _gvkey_next=matched_gvkey;
2768
2769      /* Re-point iterator and iterate down one neighbor */
2770      hi.setcur(key:AT);
2771      _rc=hi.prev();
2772
2773      /* Choose closest of the two */
2774      if (_at_next-at) < (at-matched_at) then do;
2775        matched_at=_at_next;
2776        matched_gvkey=_gvkey_next;
2777      end;
2778
2779      _rc=hi.prev();
2780      _rc=actives.remove(key:at);
2781    end;
2782    _rc=actives.remove(key:matched_at);
2783  run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WANT may be incomplete.  When this step was stopped there were 0
         observations and 4 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.06 seconds
      cpu time            0.04 seconds

Reeza · Posted 02-26-2019 01:16 PM

@Pabster the fix to @mkeintz code is to add brackets.

 set Altman.Size (where=(COSTAT="1"));

This error means the types are mismatched, so that it's being treated as a numeric variable when it's actually a character variable.

ERROR: WHERE clause operator requires compatible variables.

Pabster · Posted 02-26-2019 01:57 PM

Yeah I figured it out but now I am stock with this code:

data want (drop=_:);
  if _n_=1 then do;
    /*Make Lookup Table of active companies, with AT as lookup key */
    if 0 then
      set Altman.size (keep=COSTAT AT GVKEY rename=(AT=matched_at GVKEY=matched_gvkey));
    declare hash
      actives(ordered:'A'
             ,dataset:'Altman.size(where=(COSTAT='1' and not missing(at))
                            keep=COSTAT AT GVKEY
                            rename=(AT=matched_at GVKEY=matched_gvkey))');
      actives.definekey('matched_at');
      actives.definedata('matched_at','matched_gvkey');
      actives.definedone();
    declare hiter hi ('actives');
  end;

  /*Read inactive companies, one at a time*/
  set Altman.size (where=(COSTAT='1'));

  /*Use inactive AT as a lookup key */
  _rc=actives.find(key:AT);

  /*If not found, then check higher and lower neighbors to find nearest*/
  if _rc^=0 then do;
    /* Insert inactive AT into the ordered hash of active ATs*/
    actives.add(key:AT,data:.,data:.);

    /* Point iterator at the inserted value */
    hi.setcur(key:AT);

    /* Iterate to next higher neighbor and save values */
    _rc=hi.next();
    _at_next=matched_at;
    _gvkey_next=matched_gvkey;

    /* Re-point iterator and iterate down one neighbor */
    hi.setcur(key:AT);
    _rc=hi.prev();

    /* Choose closest of the two */
    if (_at_next-at) < (at-matched_at) then do;
      matched_at=_at_next;
      matched_gvkey=_gvkey_next;
    end;

    _rc=hi.prev();
    _rc=actives.remove(key:at);
  end;
  _rc=actives.remove(key:matched_at);
run;

and this log :

3536  data want (drop=_:);
3537    if _n_=1 then do;
3538      /*Make Lookup Table of active companies, with AT as lookup key */
3539      if 0 then
3540        set Altman.size (keep=COSTAT AT GVKEY rename=(AT=matched_at GVKEY=matched_gvkey));
3541      declare hash
3542        actives(ordered:'A'
3543               ,dataset:'Altman.size(where=(COSTAT='1' and not missing(at))
                            -----------------------------
                            49                          388
                                                        200
NOTE 49-169: The meaning of an identifier after a quoted string might change in a future SAS
             release.  Inserting white space between a quoted string and the succeeding
             identifier is recommended.

ERROR 388-185: Expecting an arithmetic operator.

ERROR 200-322: The symbol is not recognized and will be ignored.

3544                              keep=COSTAT AT GVKEY
3545                              rename=(AT=matched_at GVKEY=matched_gvkey))');
3546        actives.definekey('matched_at');
ERROR: DATA STEP Component Object failure.  Aborted during the COMPILATION phase.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.06 seconds

3547        actives.definedata('matched_at','matched_gvkey');
3548        actives.definedone();
3549      declare hiter hi ('actives');
3550    end;
3551
3552    /*Read inactive companies, one at a time*/
3553    set Altman.size (where=(COSTAT='1'));
3554
3555    /*Use inactive AT as a lookup key */
3556    _rc=actives.find(key:AT);
3557
3558    /*If not found, then check higher and lower neighbors to find nearest*/
3559    if _rc^=0 then do;
3560      /* Insert inactive AT into the ordered hash of active ATs*/
3561      actives.add(key:AT,data:.,data:.);
3562
3563      /* Point iterator at the inserted value */
3564      hi.setcur(key:AT);
3565
3566      /* Iterate to next higher neighbor and save values */
3567      _rc=hi.next();
3568      _at_next=matched_at;
3569      _gvkey_next=matched_gvkey;
3570
3571      /* Re-point iterator and iterate down one neighbor */
3572      hi.setcur(key:AT);
3573      _rc=hi.prev();
3574
3575      /* Choose closest of the two */
3576      if (_at_next-at) < (at-matched_at) then do;
3577        matched_at=_at_next;
3578        matched_gvkey=_gvkey_next;
3579      end;
3580
3581      _rc=hi.prev();
3582      _rc=actives.remove(key:at);
3583    end;
3584    _rc=actives.remove(key:matched_at);
3585  run;

mkeintz · Posted 02-26-2019 02:01 PM

OK, you have a quote within a quote, and you need a way to tell SAS that the second quote is not the end of the value. Change either the outer pair to double quotes or the inner pair to double quotes. Then you won't confuse SAS.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Programming

Finding Closest Value

Re: Finding Closest Value

Re: Finding Closest Value

Re: Finding Closest Value

Re: Finding Closest Value

Re: Finding Closest Value

Re: Finding Closest Value

Re: Finding Closest Value

Re: Finding Closest Value

Re: Finding Closest Value

Re: Finding Closest Value

Re: Finding Closest Value

Re: Finding Closest Value

Re: Finding Closest Value

Re: Finding Closest Value

ModelOps, MLOps, and Finding Value in Analytics

Finding the closest value from two datasets

Find Columns with Missing Values in a list of tables

Finding common value from two datasets

Find duplicated values in SAS EG

Follow Us

What is...

SAS Programming

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...