I have a data set with event timestamp and cusip for the stock and I want to calculate the mean volume around the timestamp using crsp.dsf file. I am thinking of loop every row in the first data set and use the timestamp in each row to match and calculate the mean volume in crsp for that company for a fixed time interval.
But it seems that do loop cant work with two data sets. Anyone have any ideas?
Hello @Sasadomo,
Your question requires more details before experts can help. Can you revise your question to include more information?
Review this checklist:
To edit your original message, select the "blue gear" icon at the top of the message and select Edit Message. From there you can adjust the title and add more details to the body of the message. Or, simply reply to this message with any additional information you can supply.
SAS experts are eager to help -- help them by providing as much detail as you can.
This prewritten response was triggered for you by fellow SAS Support Communities member @PeterClemmensen
.
In the second level of the loop, I would like to perform the following loop for every event date stamp:
data meanvol;
do until (last.cusip);
set work.forms41;
by cusip;
if first.cusip then call missing(total,n);
if (secd-45) <= date <= (secd+45) then
do;
total=sum(total,vol);
n=sum(n,1);
end;
end;
avgvoldsf=total/n;
run;
Show us what the 2 datasets look like. Does your crsp dsf file have 8-digit CUSIP? (stock specific, not company specific), or does it use permno as an id? What are the relevant variable names in BOTH of the data sets of interest? Help us help you.
Thank you for your reply.
Yes I have the 8-digit cusip in both data sets. The following is the what the data sets looks like: This is the first data set with the event time
And this is the second data set with the the daily volume.
What I attempted to do is loop every row in the first data set and for each row, match company according to the cusip in the second data set and calculate 90 days (45 days before and 45 days after ) mean for this company around the event day.
Your crsp.dsf dataset is almost certainly sorted by permno, not by cusip, so you either have to sort it by cusip/date (expensive given the size of the crsp.dsf dataset), or sort your form41 data set by permno. But that dataset probably doesn't have permno, correct? You might want to get permno's for your form41 dataset.
Also, before I suggest an approach, I see you have multiple events per cusip, and those event are often less the 45 days from each other. So do you want 90 records for EACH event?
Or if a sequence of events have no gaps larger the 45 days, do you want a single group of records starting 45 days before the earliest event in the sequence through 45 days after the latest event in the sequence.
Finally, are your CUSIPs in FORM41 the latest cusip for a given stock? Or are they merely contemporaneous with the SECD data value? It's possible that a company has merged or split since SECD, in which case CUSIPs change, but CRSP continues the permno for the surviving company.
For example, for first row in the first data set, there's a event on 2002/06/06 at 10:28:44 for company 00036110. I would like to match company 00036110 in the second data set and find find the 90 days mean for vol(in the second data set) around event date 2002/06/06. And repeat this process for every row in the first data set.
Post example test data in the form of a datastep. You can follow the post at:
Then show what the output should be from that test data. It looks like your doing a simple sum() and count() based on a group of date between two groups. It may therefore be easier to do:
data meanvol;
set work.forms41;
by cusip;
retain total n;
if first.cusip then call missing(total,n);
if (secd-45) <= date <= (secd+45) then do;
total=sum(total,vol);
n=sum(n,1);
end;
if last.cusip then do;
avgvoldsf=total/n;
output;
end;
run;
May I ask why it would be easier to use the code provided? I think they will produce the same result.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.