Need select variable from other data set in a particular dataset - Page 2

Shmuel · Posted 04-14-2018 12:27 PM

The next code is probably my typo:

/* DELETE TEMPORARY DATASETS */
proc datasets lib=paper; delete ceo_firm_1 ceo_firm_2 nolist; quit;

it should be:

/* DELETE TEMPORARY DATASETS */
proc datasets lib=paper NOLIST; delete ceo_firm_1 ceo_firm_2 ; quit;

if you run that code and the datasets are already removed - you will get appropriate message.

Feel free to run it.

The same with next code - corrected:

/* CHECK FOR DUPLICATES */
data is_dup;
 set paper.CSRP_Monthly_Stock_char4;    /* dataset origin containing var to add VOL ? */
  by ticker year;
     if not (first.year and last.year);
run;

Are there duplicates ? If positive check what are the differences between the observations of the same TICKER and YEAR values.

Have you used the SORT NODUPKEY or added another key variable to the sql join conditions?

In case final data answers to your request, please assign the post as solution.

sastuck · Posted 04-14-2018 07:57 PM

Hey,

I ran into further errors even after implementing your revisions. For fear of getting lost in the code (my results are due relatively soon), I have copy-and-pasted my original code back into my program. If you would like to know what the errors were, let me know and I can go back and try to retrieve the log for you. Otherwise, thanks so much for trying to help me get through this issue. I may post a question in on the board relating to the original code you offered me. Feel free to partake in that conversation. Thanks again, and sorry I didn't have the time to work with you to figure this one out.

Shmuel · Posted 04-15-2018 12:14 AM

I asked you to check for duplicates by running:

/* CHECK FOR DUPLICATES */
data is_dup;
 set paper.CSRP_Monthly_Stock_char4;    /* dataset origin containing var to add VOL ? */
  by ticker year;
     if not (first.year and last.year);
run;

Please post - upto 10 obs - from the result: is_dup dataset, unless the dataset is empty.

You have not answered my question - are there duplicates ?

sastuck · Posted 04-15-2018 12:23 AM

Here are the results from the code you posted:

row

ticker

date

year

2

A

2010-02-26

2010

3

A

2010-03-31

2010

4

A

2010-04-30

2010

5

A

2010-05-28

2010

6

A

2010-06-30

2010

7

A

2010-07-30

2010

8

A

2010-08-31

2010

9

A

2010-09-30

2010

10

A

2010-10-29

2010

11

A

2010-11-30

2010

12

A

2010-12-31

2010

13

A

2011-01-31

2011

14

A

2011-02-28

2011

15

A

2011-03-31

2011

16

A

2011-04-29

2011

17

A

2011-05-31

2011

18

A

2011-06-30

2011

19

A

2011-07-29

2011

20

A

2011-08-31

2011

and according to this code:

*find duplicates: ticker/year combo;
proc freq data=paper.CSRP_Monthly_Stock_char4;
	tables ticker * year / noprint 
		out=paper.multiple_tyCSRP (where=(count > 1));
run;

there are duplicates. This is a snapshot of the output:

row

ticker

year

count

percent

2

A

2011

12

0.00198938

3

A

2012

12

0.00198938

4

A

2013

12

0.00198938

5

A

2014

12

0.00198938

6

A

2015

12

0.00198938

7

A

2016

12

0.00198938

8

A

2017

12

0.00198938

9

AA

2010

12

0.00198938

10

AA

2011

12

0.00198938

11

AA

2012

12

0.00198938

12

AA

2013

12

0.00198938

13

AA

2014

12

0.00198938

14

AA

2015

12

0.00198938

15

AA

2016

11

0.0018235984

16

AA

2017

12

0.00198938

17

AAAP

2016

12

0.00198938

18

AAAP

2017

12

0.00198938

19

AABA

2017

6

0.00099469

Shmuel · Posted 04-15-2018 12:55 AM

As much as I see the key variables to get distinct values (I used previously the term unique)

are: ROW TICKER YEAR which means that the code to bring the new variable should be:

*bring variable in from other data set; 
proc sql;
     create table want as 
      select  a.* , b.VOL
      from paper.ceo_firm as a
      left join paper.CSRP_Monthly_Stock_char4 as b
     on a.ROW = b.ROW and      /* <<< line added <<< */
          a.ticker = b.ticker and
          a.year = b.year;
quit;

sastuck · Posted 04-15-2018 09:20 AM

Thanks. Here's the new log:

 Errors (4)
 Warnings
 Notes (3)
 
 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 70         
 71         *bring variable in from other data set;
 72         proc sql;
 73              create table want as
 74               select  a.* , b.VOL
 75               from paper.ceo_firm as a
 76               left join paper.CSRP_Monthly_Stock_char4 as b
 77              on a.ROW = b.ROW and
 78                   a.ticker = b.ticker and
 79                   a.year = b.year;
 ERROR: Column ROW could not be found in the table/view identified with the correlation name A.
 ERROR: Column ROW could not be found in the table/view identified with the correlation name A.
 ERROR: Column ROW could not be found in the table/view identified with the correlation name B.
 ERROR: Column ROW could not be found in the table/view identified with the correlation name B.
 NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
 80         quit;
 NOTE: The SAS System stopped processing this step because of errors.
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              8604.56k
       OS Memory           36272.00k
       Timestamp           04/15/2018 01:19:36 PM
       Step Count                        40  Switch Count  0
       Page Faults                       0
       Page Reclaims                     532
       Page Swaps                        0
       Voluntary Context Switches        8
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           0
       
 81         
 82         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 94

sastuck · Posted 04-15-2018 09:27 AM

@SASKiwi helped me out, by the way. In case you're curious, here's the code he came up with:

proc sql;
     create table ceo_firm as 
      select  a.* , b.VOL
      from paper.ceo_firm as a
      left join
     (select ticker
                ,year
                ,sum(VOL) as VOL
      from paper.CSRP_Monthly_Stock_char4
      group by ticker
              ,year
     ) as b
     on a.ticker = b.ticker and
     a.year = b.year;
quit;

Re: Need select variable from other data set in a particular dataset

Re: Need select variable from other data set in a particular dataset

Re: Need select variable from other data set in a particular dataset

Re: Need select variable from other data set in a particular dataset

Re: Need select variable from other data set in a particular dataset

Re: Need select variable from other data set in a particular dataset

Re: Need select variable from other data set in a particular dataset

Registration is open

Re: Need select variable from other data set in a particular dataset

Re: Need select variable from other data set in a particular dataset

Re: Need select variable from other data set in a particular dataset

Re: Need select variable from other data set in a particular dataset

Re: Need select variable from other data set in a particular dataset

Re: Need select variable from other data set in a particular dataset

Re: Need select variable from other data set in a particular dataset

Registration is open

SAS Training: Just a Click Away