BookmarkSubscribeRSS Feed
Sasadomo
Calcite | Level 5

I have a data set with event timestamp and cusip for the stock and I want to calculate the mean volume around the timestamp using crsp.dsf file.  I am thinking of loop every row in the first data set and use the timestamp in each row to match and calculate the mean volume in crsp for that company for a fixed time interval.

 

But it seems that do loop cant work with two data sets.  Anyone have any ideas?

 

 

10 REPLIES 10
Community_Guide
SAS Moderator

Hello @Sasadomo,


Your question requires more details before experts can help. Can you revise your question to include more information? 

 

Review this checklist:

  • Specify a meaningful subject line for your topic.  Avoid generic subjects like "need help," "SAS query," or "urgent."
  • When appropriate, provide sample data in text or DATA step format.  See this article for one method you can use.
  • If you're encountering an error in SAS, include the SAS log or a screenshot of the error condition. Use the Photos button to include the image in your message.
    use_buttons.png
  • It also helps to include an example (table or picture) of the result that you're trying to achieve.

To edit your original message, select the "blue gear" icon at the top of the message and select Edit Message.  From there you can adjust the title and add more details to the body of the message.  Or, simply reply to this message with any additional information you can supply.

 

edit_post.png

SAS experts are eager to help -- help them by providing as much detail as you can.

 

This prewritten response was triggered for you by fellow SAS Support Communities member @PeterClemmensen

.
Sasadomo
Calcite | Level 5

 

In the second level of the loop, I would like to perform the following loop for every event date stamp:

 

data meanvol;
do until (last.cusip); 
	set work.forms41;
	by cusip; 
	if first.cusip then call missing(total,n);
	if (secd-45) <= date <= (secd+45) then 
	do;
	total=sum(total,vol); 
	n=sum(n,1); 
	end;
end;
avgvoldsf=total/n;
run;

 

Sasadomo
Calcite | Level 5
where secd is the date part of the event timestamp contained in work.forms41
mkeintz
PROC Star

Show us what the 2 datasets look like.  Does your crsp dsf file have 8-digit CUSIP? (stock specific, not company specific), or does it use  permno as an id?  What are the relevant variable names in BOTH of the data sets of interest?  Help us help you.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Sasadomo
Calcite | Level 5

Thank you for your reply.

 

Yes I have the 8-digit cusip in both data sets. The following is the what the data sets looks like: This is the first data set with the event timeImage_20180403151751.png

 And this is the second data set with the the daily volume. 

Image_20180403151847.png

What I attempted to do is loop every row in the first data set and for each row, match company according  to the cusip in the second data set and calculate 90 days (45 days before and 45 days after ) mean for this company around the event day.

mkeintz
PROC Star

Your crsp.dsf dataset is almost certainly sorted by permno, not by cusip, so you either have to sort it by cusip/date (expensive given the size of the crsp.dsf dataset), or sort your form41 data set by permno.  But that dataset probably doesn't have permno, correct?  You might want to get permno's for your form41 dataset.

 

Also, before I suggest an approach, I see you have multiple events per cusip, and those event are often less the 45 days from each other.  So do you want 90 records for EACH event? 

 

Or if a sequence of events have no gaps larger the 45 days, do you want a single group of records starting 45 days before the earliest event in the sequence through 45 days after the latest event in the sequence.

 

Finally, are your CUSIPs in FORM41 the latest cusip for a given stock? Or are they merely contemporaneous with the SECD data value?  It's possible that a company has merged or split since SECD, in which case CUSIPs change, but CRSP continues the permno for the surviving company.

 

 

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Sasadomo
Calcite | Level 5
- Yes crsp.dsf is currently ordered by permno and I will order it by cusip if I am using cusip.

-For accuracy, I would prefer 90 records for each event (and the event dates are pretty disperse so making 15 days groups might be unnecessary)

-they are the latest cusip form the master table. I thought of using gvkey which is unique but it shouldn't be big issue if there's change or split as I cut off the events that are not consistent of the first filling date and last filling date.

-
Sasadomo
Calcite | Level 5

For example, for first row in the first data set, there's a event on 2002/06/06 at 10:28:44 for company 00036110. I would like to match company 00036110 in the second data set and find find the 90 days mean for vol(in the second data set) around event date 2002/06/06. And repeat this process for every row in the first data set.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Post example test data in the form of a datastep.  You can follow the post at:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

Then show what the output should be from that test data.  It looks like your doing a simple sum() and count() based on a group of date between two groups.  It may therefore be easier to do:

data meanvol;
  set work.forms41;
  by cusip;
retain total n; if first.cusip then call missing(total,n); if (secd-45) <= date <= (secd+45) then do; total=sum(total,vol); n=sum(n,1);
end; if last.cusip then do;
avgvoldsf=total/n;
output;
end; run;
Sasadomo
Calcite | Level 5

May I ask why it would be easier to use the code provided? I think they will produce the same result.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 779 views
  • 0 likes
  • 4 in conversation