- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
After managing to almost finish my code I realized I forgot to add a variable.
What I am currently trying to do is add a command that will chose the closest matching variable do another one.
Right now my code does the following : Matches an inactive company (with dlrsn= 02 or 03) to an active company from the same Industry (Sic Code) with the closest Fyear to the deletion date (DLDTE) of the Inactive company)
As you can see in the picture of my current data set there is a value COSTAT (which tells me when a company is active(0) or inactive(1)) and a value AT (total assets).
I need to make a pair sample. So when a company is inactive (COSTAT=1) I need sas to be able to chose an Active (costat=0) company with the closest AT to the Inactive company
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So, I'm not sure I am understanding the problem. It seems as if the first paragraph is not needed at all to perform the task desired in the second paragraph. Is that right?
Nevertheless, the second paragraph can be accomplished as follows:
/* UNTESTED CODE */
proc sql;
create table want as select h.gvkey as matched company,h.at as matched_at,
g.at as at_to_match
from have(where=(costat=0)) as h,have(where=(costat=1)) as g
group by g.at
having min(abs(h.at-g.at));
quit;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yeah the first paragraph was mostly explanatory just in case.
But the code didn't run.
Is this how I was suppose to write it down?
proc sql;
create table want as select h.gvkey as matched company,h.at as matched_at,
g.at as at_to_match
from Altman.Size(where=(costat=0)) as h,Altman.size(where=(costat=1)) as g
group by g.at
having min(abs(h.at-g.at));
quit;
My log screen tells me the following:
1656 proc sql;
1657 create table want as select h.gvkey as matched company,h.at as matched_at,
-------
22
ERROR 22-322: Syntax error, expecting one of the following: ',', AS, FROM.
1658 g.at as at_to_match
1659 from Altman.Size(where=(costat=0)) as h,Altman.size(where=(costat=1)) as g
----
22
76
ERROR 22-322: Syntax error, expecting one of the following: ;, ',', ANSIMISS, CROSS, EXCEPT,
FULL, GROUP, HAVING, INNER, INTERSECT, JOIN, LEFT, NATURAL, NOMISS, ORDER, OUTER,
RIGHT, UNION, WHERE.
ERROR 76-322: Syntax error, statement will be ignored.
1660 group by g.at
1661 having min(abs(h.at-g.at));
1662 quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Click on the {i} icon, and paste the log into the window that appears. Do not paste the log into your message without doing this.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
1716 proc sql; 1717 create table Altman.ATSize as 1718 select h.gvkey as matched company, ------- 22 ERROR 22-322: Syntax error, expecting one of the following: ',', AS, FROM. 1719 h.at as matched_at, 1720 g.at as at_to_match, 1721 from Altman.Size - 22 76 ERROR 22-322: Syntax error, expecting one of the following: ;, ',', ANSIMISS, CROSS, EXCEPT, FULL, GROUP, HAVING, INNER, INTERSECT, JOIN, LEFT, NATURAL, NOMISS, ORDER, OUTER, RIGHT, UNION, WHERE. ERROR 76-322: Syntax error, statement will be ignored. 1722 where=(costat=0) as h and where=(costat=1) as g 1723 group by g.at 1724 having min(abs(h.at-g.at)); 1725 quit;
does this help?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
*UPDATE*
This is the closest I have managed to make your code work with my data:
The code:
proc sql;
create table CALICE as
select h.GVKEY as matched_company, h.AT as matched_at,
g.AT as at_to_match
from Altman.size as h, Altman.size as g
where h.COSTAT eq '1' and g.COSTAT eq '0'
group by g.AT
having min(abs(h.AT-g.AT));
quit;
I have obtained the following table:
To be honest I have no idea as how I am going to obtain my final table from this , but at least it works and I can go on !
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yeah I tried but I'm not very advanced so I don't really understand the logic on how I can apply it to my program...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I suspect the SQL approach will do a Cartesian crossing of all inactive AT vs active AT values to find minimum distance - a very expensive approach.
I suggest loading all active company AT values into a hash object ACTIVES, ordered by MATCHED_AT. The hash object can be used as a very fast lookup table.
Then, while reading the inactive companies one by one:
- See if the inactive company AT is exactly matched in ACTIVES hash. If so, skip to major step below.
- If not (rc^=0) then
- Insert a dummy data item for the inactive company AT into the ordered hash of active company AT values. Because it's an ordered hash, it will be between 2 active company AT values.
- Point the iterator at the inserted dataitem.
- Move the iterator up by one data item (i.e. retrieve the next higher active-company AT).
- Copy the higher AT and its gvkey to temporary variables.
- Re-point the iterator to inactive company AT value.
- Move the iterator down by one data item
- Compare both to inactive AT and choose the closest.
- Remove the inserted inactive company
- Remove the matched AT dataitem - this is needed if you want to guarantee that no active company is matched twice.
The above sequence is almost certain to be much faster than the SQL approach. Here's the program.
data want (drop=_:);
if _n_=1 then do;
/*Make Lookup Table of active companies, with AT as lookup key */
if 0 then
set have (keep=costat at gvkey rename=(at=matched_at gvkey=matched_gvkey));
declare hash
actives(ordered:'A'
,dataset:'have(where=(costat=1 and not missing(at))
keep=costat at gvkey
rename=(at=matched_at gvkey=matched_gvkey))');
actives.definekey('matched_at');
actives.definedata('matched_at','matched_gvkey');
actives.definedone();
declare hiter hi ('actives');
end;
/*Read inactive companies, one at a time*/
set have (where=(costat=0));
/*Use inactive AT as a lookup key */
_rc=actives.find(key:AT);
/*If not found, then check higher and lower neighbors to find nearest*/
if _rc^=0 then do;
/* Insert inactive AT into the ordered hash of active ATs*/
actives.add(key:AT,data:.,data:.);
/* Point iterator at the inserted value */
hi.setcur(key:AT);
/* Iterate to next higher neighbor and save values */
_rc=hi.next();
_at_next=matched_at;
_gvkey_next=matched_gvkey;
/* Re-point iterator and iterate down one neighbor */
hi.setcur(key:AT);
_rc=hi.prev();
/* Choose closest of the two */
if (_at_next-at) < (at-matched_at) then do;
matched_at=_at_next;
matched_gvkey=_gvkey_next;
end;
_rc=hi.prev();
_rc=actives.remove(key:at);
end;
_rc=actives.remove(key:matched_at);
run;
Notice the hash object ACTIVES is "instantiated" inside the "if _N_=1 then do" code group. The automatic variable _N_ counts the number of iterations in the data step (i.e. _N_ goes from 1 to the number of companies read by the "set have (where=(costat=0))" statement). Because the hash object persists across observations, it should be instantiated only once - namely in the first data step iteration. Hence the "if _N_=1 the ..." group.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@mkeintz wrote:
I suspect the SQL approach will do a Cartesian crossing of all inactive AT vs active AT values to find minimum distance - a very expensive approach.
A trade off between execution time and programming time (and knowledge), l which also can be expensive. I can't program a hash solution. I don't know how long this will run with the original poster's data, but I have run it on what I consider to be moderate size data sets (hundreds of observations) and I don't have a problem with execution time.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
as Paige said, this might this be a little less time consuming when running moderate data sets but mine goes for 100s of thousands of lines so it might be a little to complicated to actually make it work
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I gave it a try with the following code:
data want (drop=_:); if _n_=1 then do; /*Make Lookup Table of active companies, with AT as lookup key */ if 0 then set Altman.Size (keep=costat at gvkey rename=(at=matched_at gvkey=matched_gvkey)); declare hash actives(ordered:'A' ,dataset:'have(where=(costat=1 and not missing(at)) keep=costat at gvkey rename=(at=matched_at gvkey=matched_gvkey))'); actives.definekey('matched_at'); actives.definedata('matched_at','matched_gvkey'); actives.definedone(); declare hiter hi ('actives'); end; /*Read inactive companies, one at a time*/ set Altman.Size (where=(COSTAT=1)); /*Use inactive AT as a lookup key */ _rc=actives.find(key:AT); /*If not found, then check higher and lower neighbors to find nearest*/ if _rc^=0 then do; /* Insert inactive AT into the ordered hash of active ATs*/ actives.add(key:AT,data:.,data:.); /* Point iterator at the inserted value */ hi.setcur(key:AT); /* Iterate to next higher neighbor and save values */ _rc=hi.next(); _at_next=matched_at; _gvkey_next=matched_gvkey; /* Re-point iterator and iterate down one neighbor */ hi.setcur(key:AT); _rc=hi.prev(); /* Choose closest of the two */ if (_at_next-at) < (at-matched_at) then do; matched_at=_at_next; matched_gvkey=_gvkey_next; end; _rc=hi.prev(); _rc=actives.remove(key:at); end; _rc=actives.remove(key:matched_at); run;
but my log tells me this:
2734 data want (drop=_:); 2735 if _n_=1 then do; 2736 /*Make Lookup Table of active companies, with AT as lookup key */ 2737 if 0 then 2738 set Altman.Size (keep=costat at gvkey rename=(at=matched_at gvkey=matched_gvkey)); 2739 declare hash 2740 actives(ordered:'A' 2741 ,dataset:'have(where=(costat=1 and not missing(at)) 2742 keep=costat at gvkey 2743 rename=(at=matched_at gvkey=matched_gvkey))'); 2744 actives.definekey('matched_at'); 2745 actives.definedata('matched_at','matched_gvkey'); 2746 actives.definedone(); 2747 declare hiter hi ('actives'); 2748 end; 2749 2750 /*Read inactive companies, one at a time*/ 2751 set Altman.Size (where=(COSTAT=1)); ERROR: WHERE clause operator requires compatible variables. 2752 2753 /*Use inactive AT as a lookup key */ 2754 _rc=actives.find(key:AT); 2755 2756 /*If not found, then check higher and lower neighbors to find nearest*/ 2757 if _rc^=0 then do; 2758 /* Insert inactive AT into the ordered hash of active ATs*/ 2759 actives.add(key:AT,data:.,data:.); 2760 2761 /* Point iterator at the inserted value */ 2762 hi.setcur(key:AT); 2763 2764 /* Iterate to next higher neighbor and save values */ 2765 _rc=hi.next(); 2766 _at_next=matched_at; 2767 _gvkey_next=matched_gvkey; 2768 2769 /* Re-point iterator and iterate down one neighbor */ 2770 hi.setcur(key:AT); 2771 _rc=hi.prev(); 2772 2773 /* Choose closest of the two */ 2774 if (_at_next-at) < (at-matched_at) then do; 2775 matched_at=_at_next; 2776 matched_gvkey=_gvkey_next; 2777 end; 2778 2779 _rc=hi.prev(); 2780 _rc=actives.remove(key:at); 2781 end; 2782 _rc=actives.remove(key:matched_at); 2783 run; NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 4 variables. WARNING: Data set WORK.WANT was not replaced because this step was stopped. NOTE: DATA statement used (Total process time): real time 0.06 seconds cpu time 0.04 seconds
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yeah I figured it out but now I am stock with this code:
data want (drop=_:);
if _n_=1 then do;
/*Make Lookup Table of active companies, with AT as lookup key */
if 0 then
set Altman.size (keep=COSTAT AT GVKEY rename=(AT=matched_at GVKEY=matched_gvkey));
declare hash
actives(ordered:'A'
,dataset:'Altman.size(where=(COSTAT='1' and not missing(at))
keep=COSTAT AT GVKEY
rename=(AT=matched_at GVKEY=matched_gvkey))');
actives.definekey('matched_at');
actives.definedata('matched_at','matched_gvkey');
actives.definedone();
declare hiter hi ('actives');
end;
/*Read inactive companies, one at a time*/
set Altman.size (where=(COSTAT='1'));
/*Use inactive AT as a lookup key */
_rc=actives.find(key:AT);
/*If not found, then check higher and lower neighbors to find nearest*/
if _rc^=0 then do;
/* Insert inactive AT into the ordered hash of active ATs*/
actives.add(key:AT,data:.,data:.);
/* Point iterator at the inserted value */
hi.setcur(key:AT);
/* Iterate to next higher neighbor and save values */
_rc=hi.next();
_at_next=matched_at;
_gvkey_next=matched_gvkey;
/* Re-point iterator and iterate down one neighbor */
hi.setcur(key:AT);
_rc=hi.prev();
/* Choose closest of the two */
if (_at_next-at) < (at-matched_at) then do;
matched_at=_at_next;
matched_gvkey=_gvkey_next;
end;
_rc=hi.prev();
_rc=actives.remove(key:at);
end;
_rc=actives.remove(key:matched_at);
run;
and this log :
3536 data want (drop=_:);
3537 if _n_=1 then do;
3538 /*Make Lookup Table of active companies, with AT as lookup key */
3539 if 0 then
3540 set Altman.size (keep=COSTAT AT GVKEY rename=(AT=matched_at GVKEY=matched_gvkey));
3541 declare hash
3542 actives(ordered:'A'
3543 ,dataset:'Altman.size(where=(COSTAT='1' and not missing(at))
-----------------------------
49 388
200
NOTE 49-169: The meaning of an identifier after a quoted string might change in a future SAS
release. Inserting white space between a quoted string and the succeeding
identifier is recommended.
ERROR 388-185: Expecting an arithmetic operator.
ERROR 200-322: The symbol is not recognized and will be ignored.
3544 keep=COSTAT AT GVKEY
3545 rename=(AT=matched_at GVKEY=matched_gvkey))');
3546 actives.definekey('matched_at');
ERROR: DATA STEP Component Object failure. Aborted during the COMPILATION phase.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.06 seconds
3547 actives.definedata('matched_at','matched_gvkey');
3548 actives.definedone();
3549 declare hiter hi ('actives');
3550 end;
3551
3552 /*Read inactive companies, one at a time*/
3553 set Altman.size (where=(COSTAT='1'));
3554
3555 /*Use inactive AT as a lookup key */
3556 _rc=actives.find(key:AT);
3557
3558 /*If not found, then check higher and lower neighbors to find nearest*/
3559 if _rc^=0 then do;
3560 /* Insert inactive AT into the ordered hash of active ATs*/
3561 actives.add(key:AT,data:.,data:.);
3562
3563 /* Point iterator at the inserted value */
3564 hi.setcur(key:AT);
3565
3566 /* Iterate to next higher neighbor and save values */
3567 _rc=hi.next();
3568 _at_next=matched_at;
3569 _gvkey_next=matched_gvkey;
3570
3571 /* Re-point iterator and iterate down one neighbor */
3572 hi.setcur(key:AT);
3573 _rc=hi.prev();
3574
3575 /* Choose closest of the two */
3576 if (_at_next-at) < (at-matched_at) then do;
3577 matched_at=_at_next;
3578 matched_gvkey=_gvkey_next;
3579 end;
3580
3581 _rc=hi.prev();
3582 _rc=actives.remove(key:at);
3583 end;
3584 _rc=actives.remove(key:matched_at);
3585 run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
OK, you have a quote within a quote, and you need a way to tell SAS that the second quote is not the end of the value. Change either the outer pair to double quotes or the inner pair to double quotes. Then you won't confuse SAS.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------