BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
EJAA
Obsidian | Level 7

Thanks ChrisNZ for the great efforts.

 

Please when we look into dataset "help" we see that firm 10001 is having firms with in colum d1 and d2 with companyid 8433 & 1723398. Thus these two firms are connected to firm 10001. On the beta volume sorry abt that, it should be as shown below. Thus the average daily volume of the two firms connected to firm 10001. Thanks once again.regards

 

Final Table for firm with id 10001.

CompanyID   lpermno           date              volume           stk_returns        beta_volume (obtained from firms connected to firm 10001)
10675            10001           01/03/2000        142                     0.40               400 (thus 300+500/2)
10675            10001           01/04/2000         245                    0.45                480
10675            10001            01/05/2000        285                    0.42                430
10675            10001           01/07/2000        181                    0.43                  490

10675            10001           01/10/2000        149                     0.44                 420

ChrisNZ
Tourmaline | Level 20

We are getting there.

 

Please tell me how this performs:


data HELP;
 input COMPANYID   LPERMNO  D1  D2  ;                       
cards;
10675    10001   8433     1723398   
run;
 
data HELP1;
input COMPANYID   LPERMNO   DATE : mmddyy10.  VOLUME  STK_RETURNS ;
cards;
10675    10001     01/03/2000    142     0.40
10675    10001     01/04/2000    245     0.45         
10675    10001     01/05/2000    285     0.42
10675    10001     01/07/2000    181     0.43      
10675    10001     01/10/2000    149     0.44
8433     10003     01/03/2000    300     0.10
8433     10003     01/04/2000    380     0.15         
8433     10003     01/05/2000    330     0.12
8433     10003     01/07/2000    390     0.13      
8433     10003     01/10/2000    340     0.14
1723398  10000     01/03/2000    500     0.60
1723398  10000     01/04/2000    580     0.65         
1723398  10000     01/05/2000    530     0.62
1723398  10000     01/07/2000    590     0.63      
1723398  10000     01/10/2000    500     0.64
run;

data WANT;
  set HELP1;
  if _N_=1 then do;
    dcl hash H(dataset:'HELP');
    H.definekey('LPERMNO');
    H.definedata('D1','D2');
    H.definedone();
    dcl hash H1(dataset:'HELP1(rename=(VOLUME=BETA))');
    H1.definekey('COMPANYID','DATE');
    H1.definedata('BETA');
    H1.definedone();
    call missing(D1,D2,BETA);
  end;
  call missing(D1,D2,BETA,BETA1,BETA2);
  rc=H.find();
  rc=H1.find(key:D1,key:DATE );
  BETA1=BETA;
  rc=H1.find(key:D2,key:DATE );
  BETA2=BETA;
  AVG=(BETA1+BETA2)/2;
run;

I left the intermediate calculations in the final table.

 

COMPANYID LPERMNO DATE VOLUME STK_RETURNS D1 D2 BETA BETA1 BETA2 rc AVG
10675 10001 14612 142 0.40 8433 1723398 500 300 500 0 400
10675 10001 14613 245 0.45 8433 1723398 580 380 580 0 480
10675 10001 14614 285 0.42 8433 1723398 530 330 530 0 430
10675 10001 14616 181 0.43 8433 1723398 590 390 590 0 490
10675 10001 14619 149 0.44 8433 1723398 500 340 500 0 420

 

 

ChrisNZ
Tourmaline | Level 20

@EJAA So?

 

Using 8k and 20m rows respectively, this takes 1min 20s on my machine.

 

data HELP (keep=COMPANYID LPERMNO D1 D2)
     HELP1(keep=COMPANYID LPERMNO DATE VOLUME);
  do COMPANYID= 1 to 8000;
    LPERMNO=COMPANYID+100000;
    D1     =int(ranuni(0)*8000);
    D2     =int(ranuni(0)*8000);
    output HELP;
    do DATE='01jan2010'd to '01jan2017'd;
      VOLUME=int(ranuni(0)*999);
      output HELP1;
    end;
  end;
  format DATE date9.;
run;


data WANT;
  if _N_=1 then do;
    dcl hash H(dataset:'HELP');
    H.definekey('LPERMNO');
    H.definedata('D1','D2');
    H.definedone();
    dcl hash H1(dataset:'HELP1(rename=(VOLUME=BETA))');
    H1.definekey('COMPANYID','DATE');
    H1.definedata('BETA');
    H1.definedone();
  end;
  set HELP1;
  call missing(D1,D2,BETA,BETA1);
  RC    = H.find();
  RC    = H1.find(key:D1,key:DATE );
  BETA1 = BETA;
  RC    = H1.find(key:D2,key:DATE );
  AVG   = (BETA+BETA1)/2;
  keep COMPANYID LPERMNO DATE VOLUME AVG;
run;

NOTE: There were 8000 observations read from the data set WORK.HELP.
NOTE: There were 20464000 observations read from the data set WORK.HELP1.
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
5116 at 45:15
NOTE: There were 20464000 observations read from the data set WORK.HELP1.
NOTE: The data set WORK.WANT has 20464000 observations and 10 variables.
NOTE: Compressing data set WORK.WANT decreased size by 27.29 percent.
Compressed is 9101 pages; un-compressed would require 12517 pages.
NOTE: DATA statement used (Total process time):
real time 1:19.48
user cpu time 1:16.51
system cpu time 2.65 seconds
memory 1313268.82k
OS Memory 1341000.00k
Step Count 532 Switch Count 267

 

 

 

EJAA
Obsidian | Level 7

Hi ChrisNZ

 

I sincerely apologies for the late update. I had few issues yesterday and was off hence me not updating you with reference to the codes you sent. Thanks very much for the understanding and really appreciate the  great efforts. 

 

Please just as you opinionated earlier on, we are getting there.

 

I have run the code and it worked perfectly on the sample above and is very fast of which i tried to follow the codes so i can modify the variables to enable me apply the codes to the bigger data i have but didnt work.

 

Please as i could only indicate the nature of the data when posting the question, I couldnt show much data but in my sample data HELP, the column runs from d1-d313 with most have value (companyid). Thus a specific firm could be linked to more than one firm. For instance a specific firm could be linked to just one while others two while others 20 etc. So firm with id 10001 is linked to more than two firms in the actual but couldnt show all as you requested i only post few sample for assessment and direction.Hence i have something like this below for data help

CompanyID   lpermno           d1                 d2                         d3                 d4         d5                  d6             d7      d8       d9       d10   
10675            10001           8433               1723398              2070824          18195
620983           10002          576969           8433   
959                 10025           576969         1723398    
9137               10028           2070824       18195

8433                12540          576969

 

Also please in the data help1, aside this variables, i have other variables price, market_returns in addition to

CompanyID   lpermno       date           volume      stk_returns. 

 

I have tried several times trying to follow your codes but still getting it wrong. 

 

Please we are almost there and i seek for further direction to enable me apply on the main data as the codes work on the small sample data posted earlier. Thanks very much ChrisNZ for this great assistance.regards ejaa. 

    

jklaverstijn
Rhodochrosite | Level 12

Hi Ejaa,

 

More a comment on the datamodel: whenever I see a table that has columns like your d1-d313 a get the feeling that a transposition would make any processing a lot easier and your dataset less sparse. Eg, what happens if there is need for a d314? Compare these approaches:

 

CompanyID lpermno   d1   d2   
0675      10001     8433 1723398 

 

with

CompanyID  lpermno   d
0675       10001     8433
0675       10001     1723398 

See this paper for a more indepth discussion. 

 

It would be an easy transformation and could greatly help speed up or at least simplify your code.

 

Regards,

- Jan

EJAA
Obsidian | Level 7

Hi jklaverstijn

Thanks for the paper and directions. Really helped and studied a lot. regards.

ChrisNZ
Tourmaline | Level 20

I couldn't agree more with @jklaverstijn if you have all these columns.

 

Try this which turns the table around. The join takes about the same time as the hash table on my example.

Modify it to suit your data.

 


data VERT;
  set HELP;
  array A[2] D1-D2;
  do I=1 to 2;
    D=A[I];
    if D then output;
  end;
  keep COMPANYID LPERMNO D;
run;  

proc sql ;
  create table W as 
  select source.COMPANYID, source.LPERMNO, source.DATE, source.VOLUME, mean(betavol.VOLUME) as BETAVOL 
  from HELP1     source
         left join 
       VERT      links
         on source.COMPANYID=links.COMPANYID
         inner join 
       HELP1     betavol
         on  betavol.COMPANYID=links.D
         and betavol.DATE     =source.DATE
  group by 1,2,3,4
  order by 1,2,3,4
;
quit; 

 

 

EJAA
Obsidian | Level 7

Hi ChrisNZ,

I followed your codes and directions closely and glad to notify you that it did work perfectly. The problem is now resolved and it worked really faster.Thanks very much  ChrisNZ and all who helped solve the query. 

regards ejaa.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 13166 views
  • 9 likes
  • 6 in conversation