BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kt_uwa1990
Calcite | Level 5

Hi all

I am having difficulty engineering a code on SAS. I am new to SAS and finding this extremely difficult. I would greatly appreciative if you could help me here.


I have outlined my objectives below:


-For each PERMNO (company identifier) with DISTCD 1262 or 1272 (t=0), calculate the daily standard deviation in RET for two years before (t= -1,-2) and two years after (t= +1,+2) surrounding the day of this event.

-Then create a column called BEFORE. This is an average daily standard deviation across the years (–1,–2).

-Then create a column called AFTER. This is an average daily standard deviation across the years (+1, +2).

-Then subtract BEFORE-AFTER, to produce a difference (DIFF).


A successful code should let me refer back to a DISTCD of 1272 or 1262 and see the DIFF at that specific date.


I have been stuck for a while on this and find it challenging. I highly encourage help and useful code on this problem (please see data attachment


Thanks,

Katy

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

That is more easy. Remove the case clause in SQL.

proc sql;
create table temp as
 select h.*,
      (select std(ret) from test_vol where permno = h.permno and date between intnx('year',h.date,-1,'s') and h.date) as std_one_year_before ,
      (select std(ret) from test_vol where permno = h.permno and date between intnx('year',h.date,-2,'s') and h.date) as std_two_year_before ,
      (select std(ret) from test_vol where permno = h.permno and date between h.date and  intnx('year',h.date,1,'s')) as std_one_year_after ,
      (select std(ret) from test_vol where permno = h.permno and date between h.date and  intnx('year',h.date,2,'s')) as std_two_year_after
  from test_vol as h;quit;

Ksharp

Message was edited by: xia keshan

View solution in original post

20 REPLIES 20
Ksharp
Super User

You should post some small sample data and the result you want . Your attachment is to huge ,I am not going to work with it.

kt_uwa1990
Calcite | Level 5

Ksharp

I am sorry about that, forgive me, I am still famailarising myself with SAS communities and did not realise that this database size is too large. Thank you for your comments, I have since followed your instructions and reduced the sample size by 2/3. Please see the attached data.

In response to the result I want, the table produced should look something like this (see the attached picture). Please note this is a hypothetical representation as I still need to create the code to find my results.

table_desired.JPG

You will note that SDEV should be calculated for each observation. SDEV is just the standard deviation between todays return and yesterday’s return.  For example, the standard deviation of RET 0.0021 and 0.0033 is 0.000849.

While BEFORE, AFTER, and DIFF are just calculated for those observations with either 1262 or 1272 in the DISTCD. I hope my explanations of the variables attached to the earlier post are adequate. If not, please let me know and I will provide further detail. BEFORE and AFTER are just averages of the PERMNOS previous two year SDEV and post two year SDEV.

Once again, I thank you for your response to my query. I very much hope you will be able to help me in solving my problem.

RichardinOz
Quartz | Level 8

I'm afraid your test_vol data table seems to be corrupt.  You have lots of null values and repeated rows, and the data is disordered.  I ran the following code on your second data and got some results but I don't think that is what you expected.

Proc import

        datafile = 'C:\Users\Public\Temp\test_vol.dbf'

        Out    = test_vol

        DBMS = DBF

        Replace

        ;

Run ;

Data Stdcalc (Rename = (DateRet = Date DclrRet = DclrDt DistRet = Distcd PrcRet = Prc Check = Ret)) ;

    Length Permno DateRet DclrRet DistRet PrcRet Check before after diff 8 ;

    Set test_vol ;

    Format    DateRet    DclrRet Date9. ;

    By    Permno Notsorted ;

    DateRet    =    Lag (Date) ;

    DistRet =   Lag (Distcd) ;

    DclrRet    =    Lag (Dclrdt) ;

    PrcRet  =   Lag (Prc) ;

    Check    =    Lag (Ret) ;

    If     First.Permno

      Then Check    = . ;

    Prior    =    Lag (Check) ;

    Before    =     Std (Prior, check) ;

    After    =    Std (Check, Ret) ;

    Diff    =    Before - After ;

    If    Prior = . Or Diff = . then    delete ;

    If    DistRet    > . then output ;

    Keep Permno DateRet DclrRet DistRet PrcRet Check before after diff ;

Run ;

Data    Want ;

    Set    Stdcalc (Where = (Distcd IN (1262, 1272))) ;

Run ;

Permno    Date           DclrDt           Distcd    Prc        Ret           before   after                  diff

11648      29DEC2010 21DEC2010   1262      20.1755  0.023271   0          0.0302903332  -0.030290333

10239      16NOV2007 07NOV2007  1272      27.41      0.015362  0           0                     0

10239      16NOV2007 07NOV2007  1272      27.41      0.015362  0           0                     0

10239      16NOV2007 07NOV2007  1272      27.41      0.015362  0           0                     0

etc

Richard in Oz

kt_uwa1990
Calcite | Level 5

Hi Richard in Oz

First, thank you very much for your assistance. I greatly appreciate your help with this!

-I noticed you mentioned there seems to be a lot of duplicates in my data.Maybe we could use the following code to prevent this?

Data test_vol; set test; vol; if Permno=lag(permno) and date=lag(date) and distcd=lag(distcd) then delete; run;

-Also yes there should be a few blanks for distcd, this is OK. However if there are blanks in RET it is worthwhile deleting these also.

Data test_vol; set test; vol; if RET=. then delete; run;

-Ordering the data can be obtained by inserting the code:

PROC sort data=test_vol; by PERMNO DATE; run;

I am a bit troubled by the results generated your code above. I notice there is no SDEV column. Do you know why there are so many zeros in before, after and diff? I imagine there should be at least some SDEV two years before to populate the BEFORE estimate. Likewise, there should be at least some SDEV two years after, to populate the AFTER estimate. Though I'm very happy we are getting somewhere! Progress is being made! Thank you very much for your contribution hopefully we can tackle this problem together!

-Katy

RichardinOz
Quartz | Level 8

My big concern with the data is that there is no unique key combination.  Initially I thought that missing values for DISTCD (not just a few - most are missing) did not matter because I could find the corresponding value of PERMNO but that did not work because it proved to be a many to many relationship.  It seemed at first from your data that maybe there was an implicit order, PERMNO and then DATE but even that failed in the second data set.  It would worry me that you are trying to calculate a fairly complex result from data that is so shaky.  I urge you to make sure the data is clean before you calculate and use these results.

Richard in Oz

kt_uwa1990
Calcite | Level 5

Thank you for your concern Richard, do you have any suggestions to improving the PERMNO and DATE order? Do you mean that these are not consecutive? This is not too much of a problem because the focus here is on average volatiles (standard deviations). My main objective is just to compare the typical size of the standard deviation two years before with two years after.

Ksharp

I am having difficulty with the 1st code produced actually. It appears that:

"mean_two_year_after"  has the same value as "mean_two_year_before". This cannot be correct. I suspect there is a problem with the code, maybe at the standard devation calculations:

proc sql;

create table temp as

select h.*,

      case when  h.distcd in (1262 1272) then (select std(ret) from have where permno = h.permno and date between intnx('year',-1,h.date,'s') and h.date) else . end as std_one_year_before ,

      case when  h.distcd in (1262 1272) then (select std(ret) from have where permno = h.permno and date between intnx('year',-2,h.date,'s') and h.date) else . end as std_two_year_before ,

      case when  h.distcd in (1262 1272) then (select std(ret) from have where permno = h.permno and date between h.date and  intnx('year',1,h.date,'s')) else . end as std_one_year_after ,

      case when  h.distcd in (1262 1272) then (select std(ret) from have where permno = h.permno and date between h.date and  intnx('year',2,h.date,'s')) else . end as std_two_year_after

  from test_vol as h;

I see that it contains: "  case when  h.distcd in (1262 1272)" this worries me.

I shall provide an example of what I am looking to do, if permno 10008 had 1262 on 03/07/2010... I would like to calculate permos 10008 average standard deviation for 03/07/2008 to 03/07/2010, this is known as before (note the daily standard deviations used in the mean computation do not have to be associated  with their own 1262 and 1272 distcd). Then, following the same pattern, I would also like to calculate permnos 10008  average standard deviation from 03/07/2010 to 03/07/2012. If my data doesn't give me the full two years before, and after, that is OK just utilize the observations that I have. Are you sure the code above accurately does this?

I have attempted to use your second code without the 1262 and 1272 but it appears that this is taking too long with my data.

art297
Opal | Level 21

I'm just looking at your question, and the responses, for the first time thus forgive me if I ask some dumb questions.

Your example wasn't very helpful as your data doesn't have any records for permno 10008.

Is DCLRDT supposed to be used for anything?  I noticed that it wasn't used in any of the suggested solutions.

Permno 10239, has a number of records with DISTCD codes that meet your criteria.  And, of those, the date ranges would include the same records regardless of whether one uses the date or the DCLRDT field as the base date.  Are you expecting to treat each one of those separately?

Regardless of which date is supposed to be used, should the data for that particular date for a particular permno be used for the std devs and, if so, for the years before or the years after or both?  Plus, regarding the specification one year before, is that 1 day to one year before, or one year to two years minus 1 day before?

KSharp's code refers to a file called have.  Is there another file I didn't see in your post?

Also, KSharp had a slight error (I think) in his use of the intnx function which is why you were getting such odd results.

That said, the following may be incorrect as it depends upon your answers to the above questions, whether I correctly understand the problem, and whether my logic is actually correct in accomplishing the task.  The code is definitely NOT guaranteed or fully tested:

proc sql;

  create table met_criterion as

    select distinct permno,DCLRDT

      from test_vol

        where DISTCD in (1262,1272)

  ;

  create table want as

    select h.*,

      (select std(ret) from test_vol where permno = h.permno and date between intnx('year',h.DCLRDT,-1,'s') and h.DCLRDT) as std_one_year_before ,

      (select std(ret) from test_vol where permno = h.permno and date between intnx('year',h.DCLRDT,-2,'s') and h.DCLRDT) as std_two_year_before ,

      (select std(ret) from test_vol where permno = h.permno and date between h.DCLRDT and  intnx('year',h.DCLRDT,1,'s')) as std_one_year_after ,

      (select std(ret) from test_vol where permno = h.permno and date between h.DCLRDT and  intnx('year',h.DCLRDT,2,'s')) as std_two_year_after

  from met_criterion as h;

quit;

kt_uwa1990
Calcite | Level 5

Hi art

Thank you very much for your questions and helpfulness I was beginning to worry this problem may not get resolved.

In response to your questions:

DCLRDT is the date that the distribution code (1262 or 1272) is declared to the market. This being said, this date should act like the premise for the two years before and two years after calculation. I.e.: if the DCLRDT was 01-01-2010 then two years before should extend to 01-01-2008 for DATE and two years after should extend to 01-01-2012 for DATE, with respect to the averages for standard deviation. However the DCLRDT and the DATE for 1262 and 1272 should not be vastly different from one another. Therefore for simplicities sake I encourage using the DATE as the basis or starting point.

If Permno 10239 has a number of 1262s and 1272s which overlap in the +/-2 year date ranges this is ok. The overlap of SDEV information for the same permno is fine. But I would like to treat each 1262 and 1272 separately.

I have supplied no file called “have”. To be frank, I am only interested in 2 before and after analysis, not 1yr. I am not sure why this has been included.

Does your code supplied satisfy the conditions discussed above?

As there are concerns with the earlier posts. I would also like to ask you: how would your code change if we remove the 1262 and 1272 restriction?

That is, for each PERMNO calculate the daily standard deviation (SDEV) for RET two years before (t= -1,-2) and two years after (t= +1,+2) surrounding each date recorded, or each observation.

So the column called BEFORE, the average daily standard deviation across the years (–1,–2) is calculated for all observations.

And the column called AFTER. This is an average daily standard deviation across the years (+1, +2) is also calculated for all observations.

-I then subtract BEFORE-AFTER, to produce a difference column (DIFF) which should be produced for all the observations.

If is is possible to see code for:

(1) With 1262 and 1272 restriction

(2) And without 1262 and 1272 restriction

I would be unduly grateful. Thank you very much once again.

art297
Opal | Level 21

No, my code didn't meet all of your requirements and I discovered some additional complications in your data.  When the distcd codes were added numerous duplicate records were introduced.

The following is an attempt to get rid of the duplicates without losing any of the critical information.  You will have to check if it does what I had intended for it to do.

That said, the following does (I think at least) meet your various requirements.  However, as for your question about expanding your data records to include calculations for each day, I'll have to leave that for someone more familiar with analyzing this type of data.

Here is the approach I took:

PROC IMPORT OUT= WORK.test_vol_in

            DATAFILE= "C:\art\test_vol.dbf"

            DBMS=DBF REPLACE;

     GETDELETED=NO;

RUN;

data contains_distcd;

  set test_vol_in;

  if not missing(distcd);

run;

data does_not_contain_distcd;

  set test_vol_in;

  if missing(distcd);

run;

proc sort data=contains_distcd;

  by permno date ret;

run;

data contains_distcd_cleaner;

  set contains_distcd;

  retain contains_1262 contains_1272 contains_other;

  by permno date;

  if first.date then do;

    call missing(contains_1262);

    call missing(contains_1272);

    call missing(contains_other);

  end;

  if distcd eq 1262 then contains_1262=1;

  else if distcd eq 1272 then contains_1272=1;

  else contains_other=1;

  if last.date then output;

run;

data both;

  set does_not_contain_distcd contains_distcd_cleaner;

run;

proc sql;

  create table met_criterion_1262 as

    select distinct permno,date

      from both

        where contains_1262=1

  ;

  create table met_criterion_1272 as

    select distinct permno,date

      from both

        where contains_1272=1

  ;

  create table want_1262 as

    select h.*,

      (select std(ret) from both where permno = h.permno and date between intnx('year',h.date,-2,'s') and h.date) as before ,

      (select std(ret) from both where permno = h.permno and date between h.date and intnx('year',h.date,2,'s')) as after,

      calculated before-calculated after as diff

         from met_criterion_1262 as h

  ;

  create table want_1272 as

    select h.*,

      (select std(ret) from both where permno = h.permno and date between intnx('year',h.date,-2,'s') and h.date) as before ,

      (select std(ret) from both where permno = h.permno and date between h.date and intnx('year',h.date,2,'s')) as after,

      calculated before-calculated after as diff

         from met_criterion_1272 as h

  ;

quit;

kt_uwa1990
Calcite | Level 5

This code is thorough and works swimmingly! Thank you thank you thank you, these were the kind of results I was expecting!!

Just a quick question. As I also want to find the +/- 2 yr average SDEV for all stocks (regardless of 1262 and 1272 or not) how should I tweak the code? You will notice Ksharp provides a solution, but due to past concerns, is this an incorrect approach to take? I am just thinking surely this approach must be easier than that discussed above. Hmm, there must be a solution for this!

Btw, if duplicate data arises this can be easily solved using: data test_vol_in; set test_vol_in; if permno=lag(permno) and date=lag(date) then delete; run;

I am pleased progress is being made, you provide a very valuable contribution art! Almost there!

-Katy

art297
Opal | Level 21

Katy, Glad to have been able to be of help but, like I said, this is not my area.  I am a psychologist, not a statistician or a economist.  I'm sure that others can provide more correct guidance, at this point, than I can.

If you are looking for before and after effects, I would think that you have to decide what constitutes the border between before and after.  If it is just ANY of the codes in the field from which you selected the other two, then it is an easy problem to solve.  But, not knowing what any of those codes represent, I am hesitant to suggest an approach.

kt_uwa1990
Calcite | Level 5

No worries at all, you have been exceptionally helpful. I will test out Ksharps code and keep playing with this data!

Ksharp
Super User

Arthur.T

Thank you to point out my mistake.

I think I am too long to learn SAS .

Best Regard.

Ksharp

proc sql;
create table temp as
 select h.*,
      (select std(ret) from test_vol where permno = h.permno and date between intnx('year',h.date,-1,'s') and h.date) as std_one_year_before ,
      (select std(ret) from test_vol where permno = h.permno and date between intnx('year',h.date,-2,'s') and h.date) as std_two_year_before ,
      (select std(ret) from test_vol where permno = h.permno and date between h.date and  intnx('year',h.date,1,'s')) as std_one_year_after ,
      (select std(ret) from test_vol where permno = h.permno and date between h.date and  intnx('year',h.date,2,'s')) as std_two_year_after
  from test_vol as h;quit;
Ksharp
Super User

OK. If I understand what you mean.

proc sql;
create table temp as
 select h.*,
      case when  h.distcd in (1262 1272) then (select std(ret) from have where permno = h.permno and date between intnx('year',-1,h.date,'s') and h.date) else . end as std_one_year_before ,
      case when  h.distcd in (1262 1272) then (select std(ret) from have where permno = h.permno and date between intnx('year',-2,h.date,'s') and h.date) else . end as std_two_year_before ,
      case when  h.distcd in (1262 1272) then (select std(ret) from have where permno = h.permno and date between h.date and  intnx('year',1,h.date,'s')) else . end as std_one_year_after ,
      case when  h.distcd in (1262 1272) then (select std(ret) from have where permno = h.permno and date between h.date and  intnx('year',2,h.date,'s')) else . end as std_two_year_after
  from test_vol as h;

create table want as
 select *,
       mean(std_one_year_before) as mean_one_year_before ,
       mean(std_two_year_before) as mean_two_year_before ,
       calculated  mean_one_year_before - calculated mean_two_year_before as diff_before ,
       mean(std_one_year_after) as mean_one_year_after ,
       mean(std_two_year_after) as mean_two_year_after ,
       calculated  mean_one_year_after - calculated mean_two_year_after as diff_after 
  from temp
   group by permno;
quit;



Ksharp

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 1497 views
  • 6 likes
  • 4 in conversation