BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
EJAA
Obsidian | Level 7

Hi All, 

I'm running the codes below on two data sets but its taking extremely long to process. I don't really know what is wrong with my codes that is taking too long to process when i run. I have about 9700 firms in one data set and in another  data set a list of firms connected to each of the 9700 firms. Due to the slowness i did  run from 1-50 firms  but that is even not processing after a day. The code gives no errors. Any advice for me to correct the slowness. Thanks in advance for the help. Regards ejaa.

 

/*****getiing the count of number of lpermno in help*****/
proc sql noprint; select count(lpermno) into: A from ase.help ; quit ;

%put &A ;

 

/********getting a sequence order to observations**********/
Data ase.help ;
Set ase.help;
Retain seq 0 ;
Seq = seq + 1 ;
run ;

 

/*******Transpose help to get long format*********************/
proc transpose data=ase.help out=ase.help_trans;
by lpermno seq;
run;

 

Data ase.help_trans (Rename = (COL1 = companyid)) ;
Set ase.help_trans ;
Where COL1 ne . ;
Run ;

 

%let C1 = 1 ;
%let c2 = 50 ;


/***********Separating each set of firms attched to lpermno*************/

%macro rpt;

%Do R = &C1 %to &C2;

data ase.lp&R. (rename = (Seq = seq&R.)) ;
Set ase.help_trans ;
Where seq = &R ;
Run ;


proc sort data = ase.lp&R.; by companyid ; run ;
%end ;
%mend rpt ;

%rpt ;

 

/***********Merge separated sets to help1******************/

Proc sort data = ase.help1 ; By companyid ; run ;

%macro rpt1;
%Do R = &C1 %to &C2;
data ase.com_1 ;
Merge ase.lp1 - ase.lp&R. ;
By companyid ;
run ;
%end ;
%mend rpt1 ;

%rpt1 ;

 

Proc sort data = ase.com_1 ; By companyid ; run ;

data ase.com ;
Merge ase.help1(in= a drop = lpermno ) ase.com_1 ;
By companyid ;
If a ;
run ;


/**************Select different lpermno from help********************/

%macro rpt2;
%Do R = &C1 %to &C2;
Proc sort data = ase.lp&R. nodupkey out = ase.f&R.; By lpermno ; run ;
%end ;
%mend rpt2 ;

%rpt2 ;

 

/***************************merge lpermno from help to help 1 *****************************/
Proc sort data = ase.help1 ; By lpermno ; run ;

%macro rpt3;
%Do R = &C1 %to &C2;
Data ase.Com_long1 ;
merge  ase.f1 - ase.f&R. ;
by lpermno ;
Run ;
%end ;
%mend rpt3 ;

%rpt3 ;

 

Data ase.Com_long ;
merge ase.help1 (in = a) ase.Com_long1 ;
by lpermno ;
If a ;
Run ;

 

/*************Getting different sets of firms out of help1***********************/
%macro rpt4;
%Do R = &C1 %to &C2;
data ase.sum&R. ;
Set ase.com ;
where seq&R. = &R ;
Run ;
Proc sort data = ase.sum&R. ; By date ; run ;
%end ;
%mend rpt4 ;

%rpt4 ;

 

/*******getting mean prices by date in all the sets*************************/
%macro rpt5;
%Do R = &C1 %to &C2;
proc summary data = ase.sum&R. ;
by date;
var vol;
output out=ase.sum_&R. mean=beta_vol ;
run;
%end;
%mend rpt5 ;

%rpt5 ;


/********************separating lpermno to different sets from help1*********************************/
%macro rpt6;
%Do R = &C1 %to &C2;
data ase.price&R. ;
Set ase.Com_long ;
where seq&R. = &R ;
Run ;
%end ;
%mend rpt6 ;

%rpt6 ;


/***********************Merging matching lpermno and attched firms by date*********************************/
%macro rpt7 ;
%Do R = &C1 %to &C2;
data ase.final&R.(drop = _TYPE_ _FREQ_ _NAME_ ) ;
Merge ase.price&R. (in = a) ase.sum_&R. (in = b) ;
By date ;
If a and b ;
run ;
%end ;
%mend rpt7 ;

%rpt7 ;


/************************Getting regression of each set**********************************/
%macro rpt8;
%Do R = &C1 %to &C2;
proc reg data=ase.final&R. outest = ase.result&R. ;
By lpermno ;
model vol = beta_vol stk ;
run;
%end ;
%mend rpt8 ;

%rpt8 ;

 

/**********getting results together****************************/

%macro rpt9;
%Do R = &C1 %to &C2;
Data ase.reg1 ;
Set ase.result1 - ase.result&R. ;
run;
%end ;
%mend rpt9 ;

%rpt9 ;

/**********************END********************************/

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

I couldn't agree more with @jklaverstijn if you have all these columns.

 

Try this which turns the table around. The join takes about the same time as the hash table on my example.

Modify it to suit your data.

 


data VERT;
  set HELP;
  array A[2] D1-D2;
  do I=1 to 2;
    D=A[I];
    if D then output;
  end;
  keep COMPANYID LPERMNO D;
run;  

proc sql ;
  create table W as 
  select source.COMPANYID, source.LPERMNO, source.DATE, source.VOLUME, mean(betavol.VOLUME) as BETAVOL 
  from HELP1     source
         left join 
       VERT      links
         on source.COMPANYID=links.COMPANYID
         inner join 
       HELP1     betavol
         on  betavol.COMPANYID=links.D
         and betavol.DATE     =source.DATE
  group by 1,2,3,4
  order by 1,2,3,4
;
quit; 

 

 

View solution in original post

22 REPLIES 22
ChrisBrooks
Ammonite | Level 13

If you can't get the program to finish my advice would be to run it step by step until you find something that doesn't finish - my guess is it'll be the Proc Reg but if you run it as I suggest at least you'll know where to concentrate your efforts.

jklaverstijn
Rhodochrosite | Level 12

Examine your log and look at the "real time" lines after each step. Especially the ones that show a high ratio between real time and cpu time are suspect and candidates for improvement. Also explain your data in more detail. Are they SAS datasets or tables in a DBMS? Do you use indexes? Maybe sorting will help? Performance is an elaborate field of expertise and lots of info is required before anyone can help. Share a sample of your data and above all your logs in this forum. The first commandment of performance analysis is " Thou Shalt Read Log Files". Use FULLSTIMER and MSGLEVEL=I options for more detailed info. And in case of macro's, use at least the MPRINT, SYMBOLGEN and MLOGIC options as well.

 

Hop this helps,

- Jan.

Astounding
PROC Star

Just spot-checking the program, the design looks like it will take about 25 times what it needs to.  Let me give you just a couple of examples, so you can rethink the process.

 

Example 1:  %RPT

 

This step reads the data set ASE.HELP_TRANS 50 times, once for each iteration of the %DO loop.  Better planning would read the data set just once, and would create all 50 data sets with one pass through the data.  Just a brief example to give you the idea:

 

proc sort data=ase.help_trans;

by company_id;

run;

 

data ase.lp1 (rename=(seq=seq1))

ase.lp50 (rename=(seq=seq50));

set ase.help_trans;

if seq=1 then output ase.lp1;

else if seq=50 then output ase.lp50;

run;

 

The use of macro language would get a little more complex than what you have now.  Is that worth it, to cut out 95% of the time?

By sorting first, you can eliminate the need for 50 sorts (albeit on smaller data sets) later.

 

Example 2:  %RPT1

 

It's not clear if this step contains a typo, because it re-creates the same data set 50 times.  Each iteration replaces ASE.COM_1 with a new version.  If that's the intent, there is no need for the first 49 iterations.  If that's a typo, the program is still inefficient.  Assuming that the output should actually be named ASE.COM_&R the process should be simplified so that the interior of the loop reads:


data ase.com_&R ;
Merge ase.lp%EVAL(&R-1)  ase.lp&R. ;
By companyid ;
run ;

 

The first iteration would need a little extra attention, so that you don't merge a data set with itself.  But once you get to the 50th iteration, there is no need to merge 50 data sets.  Just merge 2 data sets (one of which is the merged result from the 49th iteration).

 

In summary, these are just a couple of examples.  The program was designed looking at only small pieces of the process and using macro language to repeat them 50 times.  A better approach would look at the larger process, and design something that requires much less work on the computer's part.

 

 

Reeza
Super User

How many rows are your data sets?

EJAA
Obsidian | Level 7

The number of rows for the data set with 9700 firms is 19725636 while the other is having 7927 rows. Thanks.

ChrisNZ
Tourmaline | Level 20

What @Astounding said.

You seem to be doing the same thing over and over gain on tiny bits of data.

Please post and small data example of where you start from and what you want to obtain, and it can probably be derived in a few steps.

 

 

EJAA
Obsidian | Level 7

Hi ChrisNZ

Thanks for the direction.

Please attached sample data.Thanks in advance for the help. regards

ChrisNZ
Tourmaline | Level 20

Please don't ask us to do all your work.
Please post a small data example of where you start from and what you want to obtain. Typically 100 observations is sufficient.

EJAA
Obsidian | Level 7

Thanks ChrisNZ.

Attached is a small sample of my dataset with reference to the nature of the data. In the end i want to obtain regression output for each firm. Thanks.

ChrisNZ
Tourmaline | Level 20

Feel free to not bother supplying the basic information that's necessary to answer your question, and to not getting answers.

 

Here's an example of how it's done: https://communities.sas.com/t5/General-SAS-Programming/Creating-time-intervals-based-on-two-date-var...

EJAA
Obsidian | Level 7

Hi ChrisNZ.

Thanks once again. 

Following your direction, this is how my dataset looks like: 

 

Data set “help” contains the number of firms connected to each firm.d1-d3 are companyid’s of firms connected to for instance firm with lpermno 10001.

 

 CompanyID   lpermno           d1                 d2                         
10675            10001           8433               1723398              
620983           10002          576969           8433   
959                 10025           576969         1723398    
9137               10028           2070824       18195      

 

Data set “help1” contains daily volume and skt_returns for 9700 firms.

 

CompanyID   lpermno           date              volume           stk_returns
10675            10001           01/03/2000        142                     0.40
10675            10001           01/04/2000         245                    0.45         
10675            10001            01/05/2000        285                    0.42
10675            10001           01/07/2000        181                    0.43      

10675            10001           01/10/2000        149                     0.44
620983         10002           01/03/2000        200                     0.30
620983         10002           01/04/2000         280                    0.35         
620983         10002            01/05/2000        230                    0.32
620983         10002           01/07/2000        190                    0.33      

620983         10002           01/10/2000        240                      0.34

8433             10003           01/03/2000        300                     0.10
8433             10003           01/04/2000         380                    0.15         
8433             10003           01/05/2000        330                    0.12
8433             10003           01/07/2000        390                    0.13      

8433             10003           01/10/2000        340                     0.14

576969          10004          01/03/2000        400                     0.80
576969          10004           01/04/2000         480                    0.85         
576969          10004           01/05/2000        430                    0.82
576969          10004           01/07/2000        490                    0.83      

576969          10004           01/10/2000        440                     0.84

1723398          10000          01/03/2000        500                     0.60
1723398          10000           01/04/2000       580                    0.65         
1723398          10000           01/05/2000        530                    0.62
1723398         10000           01/07/2000        590                    0.63      

1723398         10000           01/10/2000       500                     0.64

 

Final Table for firm with id 10001.

CompanyID   lpermno           date              volume           stk_returns        beta_volume (obtained from firms connected to firm 10001)
10675            10001           01/03/2000        142                     0.40                21
10675            10001           01/04/2000         245                    0.45                30
10675            10001            01/05/2000        285                    0.42                40
10675            10001           01/07/2000        181                    0.43                  84

10675            10001           01/10/2000        149                     0.44                 41

 

From data set help, I first create a portfolio of firms  connected to each firm from data set help1 to obtain portfolio of their volume called beta_volume (mean of volume for each day). I then created dataset for each firm  obtain the "final table" shown above. In the end, I run a regression where I regress volume on beta volume and stk_returns for each firm. This the work i used the earlier codes i posted to do.Thanks very much.

regards

ejaa.

ChrisNZ
Tourmaline | Level 20

Please provide a sample small enough that we know how the value 21 (beta_volume) that you are showing in the desired output is calculated. We can't see where it comes from here.

EJAA
Obsidian | Level 7

Please the beta_volume is the daily average/mean of the firms connected to a specific firm. So for instance in the example given, firm 10001 is connected to two firms thus 8433 & 1723398. Hence i obtain the average of daily volume of these two firms and link it to firm 10001 by date to produce the final table to run the regression. 

 

In my earlier code, this what i used to obtain the mean of firms connected to a specific firm.

 

%macro rpt5;


%Do R = &C1 %to &C2;


proc summary data = ase.sum&R. ;
by date;
var vol;
output out=ase.sum_&R. mean=beta_vol ;
run;


%end;
%mend rpt5 ;

%rpt5 ;

 

Thanks very much. 

 

ChrisNZ
Tourmaline | Level 20
I am about to quit trying, The small example you supplied does not show the link you mention between firm 10001 and 8433 & 1723398, and does not show volumes that would explain how 41 is attained.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 13007 views
  • 9 likes
  • 6 in conversation