The next code is probably my typo:
/* DELETE TEMPORARY DATASETS */ proc datasets lib=paper; delete ceo_firm_1 ceo_firm_2 nolist; quit;
it should be:
/* DELETE TEMPORARY DATASETS */
proc datasets lib=paper NOLIST; delete ceo_firm_1 ceo_firm_2 ; quit;
if you run that code and the datasets are already removed - you will get appropriate message.
Feel free to run it.
The same with next code - corrected:
/* CHECK FOR DUPLICATES */
data is_dup;
set paper.CSRP_Monthly_Stock_char4; /* dataset origin containing var to add VOL ? */
by ticker year;
if not (first.year and last.year);
run;
Are there duplicates ? If positive check what are the differences between the observations of the same TICKER and YEAR values.
Have you used the SORT NODUPKEY or added another key variable to the sql join conditions?
In case final data answers to your request, please assign the post as solution.
Hey,
I ran into further errors even after implementing your revisions. For fear of getting lost in the code (my results are due relatively soon), I have copy-and-pasted my original code back into my program. If you would like to know what the errors were, let me know and I can go back and try to retrieve the log for you. Otherwise, thanks so much for trying to help me get through this issue. I may post a question in on the board relating to the original code you offered me. Feel free to partake in that conversation. Thanks again, and sorry I didn't have the time to work with you to figure this one out.
I asked you to check for duplicates by running:
/* CHECK FOR DUPLICATES */
data is_dup;
set paper.CSRP_Monthly_Stock_char4; /* dataset origin containing var to add VOL ? */
by ticker year;
if not (first.year and last.year);
run;
Please post - upto 10 obs - from the result: is_dup dataset, unless the dataset is empty.
You have not answered my question - are there duplicates ?
Here are the results from the code you posted:
row | ticker | date | year |
2 | A | 2010-02-26 | 2010 |
3 | A | 2010-03-31 | 2010 |
4 | A | 2010-04-30 | 2010 |
5 | A | 2010-05-28 | 2010 |
6 | A | 2010-06-30 | 2010 |
7 | A | 2010-07-30 | 2010 |
8 | A | 2010-08-31 | 2010 |
9 | A | 2010-09-30 | 2010 |
10 | A | 2010-10-29 | 2010 |
11 | A | 2010-11-30 | 2010 |
12 | A | 2010-12-31 | 2010 |
13 | A | 2011-01-31 | 2011 |
14 | A | 2011-02-28 | 2011 |
15 | A | 2011-03-31 | 2011 |
16 | A | 2011-04-29 | 2011 |
17 | A | 2011-05-31 | 2011 |
18 | A | 2011-06-30 | 2011 |
19 | A | 2011-07-29 | 2011 |
20 | A | 2011-08-31 | 2011 |
and according to this code:
*find duplicates: ticker/year combo;
proc freq data=paper.CSRP_Monthly_Stock_char4;
tables ticker * year / noprint
out=paper.multiple_tyCSRP (where=(count > 1));
run;
there are duplicates. This is a snapshot of the output:
row | ticker | year | count | percent |
2 | A | 2011 | 12 | 0.00198938 |
3 | A | 2012 | 12 | 0.00198938 |
4 | A | 2013 | 12 | 0.00198938 |
5 | A | 2014 | 12 | 0.00198938 |
6 | A | 2015 | 12 | 0.00198938 |
7 | A | 2016 | 12 | 0.00198938 |
8 | A | 2017 | 12 | 0.00198938 |
9 | AA | 2010 | 12 | 0.00198938 |
10 | AA | 2011 | 12 | 0.00198938 |
11 | AA | 2012 | 12 | 0.00198938 |
12 | AA | 2013 | 12 | 0.00198938 |
13 | AA | 2014 | 12 | 0.00198938 |
14 | AA | 2015 | 12 | 0.00198938 |
15 | AA | 2016 | 11 | 0.0018235984 |
16 | AA | 2017 | 12 | 0.00198938 |
17 | AAAP | 2016 | 12 | 0.00198938 |
18 | AAAP | 2017 | 12 | 0.00198938 |
19 | AABA | 2017 | 6 | 0.00099469 |
As much as I see the key variables to get distinct values (I used previously the term unique)
are: ROW TICKER YEAR which means that the code to bring the new variable should be:
*bring variable in from other data set;
proc sql;
create table want as
select a.* , b.VOL
from paper.ceo_firm as a
left join paper.CSRP_Monthly_Stock_char4 as b
on a.ROW = b.ROW and /* <<< line added <<< */
a.ticker = b.ticker and
a.year = b.year;
quit;
Thanks. Here's the new log:
Errors (4) Warnings Notes (3) 1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 70 71 *bring variable in from other data set; 72 proc sql; 73 create table want as 74 select a.* , b.VOL 75 from paper.ceo_firm as a 76 left join paper.CSRP_Monthly_Stock_char4 as b 77 on a.ROW = b.ROW and 78 a.ticker = b.ticker and 79 a.year = b.year; ERROR: Column ROW could not be found in the table/view identified with the correlation name A. ERROR: Column ROW could not be found in the table/view identified with the correlation name A. ERROR: Column ROW could not be found in the table/view identified with the correlation name B. ERROR: Column ROW could not be found in the table/view identified with the correlation name B. NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements. 80 quit; NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE SQL used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 8604.56k OS Memory 36272.00k Timestamp 04/15/2018 01:19:36 PM Step Count 40 Switch Count 0 Page Faults 0 Page Reclaims 532 Page Swaps 0 Voluntary Context Switches 8 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 0 81 82 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 94
@SASKiwi helped me out, by the way. In case you're curious, here's the code he came up with:
proc sql;
create table ceo_firm as
select a.* , b.VOL
from paper.ceo_firm as a
left join
(select ticker
,year
,sum(VOL) as VOL
from paper.CSRP_Monthly_Stock_char4
group by ticker
,year
) as b
on a.ticker = b.ticker and
a.year = b.year;
quit;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.