BookmarkSubscribeRSS Feed
aaronsterri
Calcite | Level 5

Hi, I have wficn1 which is fund id at the CRSP data.

I have a post variable that turns on when there is a change to the fund.

In the attached sample data, I have 4 wficn1s (100001, 102349, 102356, 603184).

For fund 102349, as you can see “post” turns on at 2010 1st quarter. I wanted to create a dataset with following variables: t, tplus1, tplus2, tplus3….and so on, in addition to tminus1, tminus2, tminus3…..and so on. So for this fund, I wanted t to be equal to one for 2010 1q, tplus1 to be equal to one for 2010 2q, and tminus1 to be equal to one for 2009 4q, and so on.

i wanted to do this by fund. 

 

The issue is that post turns on at different times for different funds. And sometimes, it will never turn on.

anybody with insights on how to do this? i have attached the sas file. 

Thank you!

9 REPLIES 9
mkeintz
PROC Star

You say you want lots of new variables, but you describe some that looks like a single new column (i.e. one variable, call it OFFSET), where

 

For record 2010Q1   OFFSET=0
For record 2010Q2   OFFSET=1
...

and 
For record 2009Q4  OFFSET=-1
For record 2009Q3  OFFSET=-2

...

 

And note that I also suggest you have a record 0.    You describe a record at -1 followed by a record at +1.  Is that what you really want?

 

And for WFICN that never has POST change from 0 to 1, do you want to drop the data observations?  Or just to set variable OFFSET to a missing value?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
aaronsterri
Calcite | Level 5
I agree with you. I would need to create an offset variable. Could you help? I would just leave offset to zero if post never turns on. Thank you
aaronsterri
Calcite | Level 5

hi,

i have a group variable called "fund". in my data i have many funds.

 

i have a time variable called post that turns on to one. The issue is that this turns on at different times for different funds. 

 

i wanted to create time indicators. consider "fund a" where post turns on at 2010 then i am looking to indicate the following:

for 2010, variable t would equal to 1.

for 2009, variable t-1 equal to 1.

for 2011, variable t+1 would equal to 1.

 

as mentioned the issue is that post turns on at different times for different funds. 

 

could you please help? thank you!

ScottBass
Rhodochrosite | Level 12

@aaronsterri wrote:

hi,

i have a group variable called "fund". in my data i have many funds.

 

i have a time variable called post that turns on to one. The issue is that this turns on at different times for different funds. 

 

i wanted to create time indicators. consider "fund a" where post turns on at 2010 then i am looking to indicate the following:

for 2010, variable t would equal to 1.

for 2009, variable t-1 equal to 1.

for 2011, variable t+1 would equal to 1.

 

as mentioned the issue is that post turns on at different times for different funds. 

 

could you please help? thank you!


 

Post have and want datasets using datalines and the Insert SAS code icon.

t-1 and t+1 are invalid variable names.

I'm not going to try to guess what you really want.


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
aaronsterri
Calcite | Level 5
fundyearpostt_minus2t_minus1t_0tplus1tplus2
a2009001000
a2010100100
a2011100010
a2012100001
b2007010000
b2008001000
b2009100100
b2010100010
c2012000000
c2013010000
c2014001000
c2015100100

 

thank you. i have fund, year, and post and would like the following columns such as t_minus2 all the way to t_plus 2.

 

thank you!

ScottBass
Rhodochrosite | Level 12

@aaronsterri wrote:
fund year post t_minus2 t_minus1 t_0 tplus1 tplus2
a 2009 0 0 1 0 0 0
a 2010 1 0 0 1 0 0
a 2011 1 0 0 0 1 0
a 2012 1 0 0 0 0 1
b 2007 0 1 0 0 0 0
b 2008 0 0 1 0 0 0
b 2009 1 0 0 1 0 0
b 2010 1 0 0 0 1 0
c 2012 0 0 0 0 0 0
c 2013 0 1 0 0 0 0
c 2014 0 0 1 0 0 0
c 2015 1 0 0 1 0 0

 

thank you. i have fund, year, and post and would like the following columns such as t_minus2 all the way to t_plus 2.

 

thank you!


 

Post have and want datasets using datalines and the Insert SAS code icon.

When I cut and pasted your code into SAS I got an error.

 

Edit:  And can you also state the actual analysis you want to do with this data structure?  TBH it reeksof bad data modelling, although hard to know for sure since we don't know what you actually want to do with this data structure.


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
mkeintz
PROC Star

BTW,. your YEARQTR variable as you have it is a character variable (200 bytes long - i.e. lots of trailing blanks).  It is not useful for any sort of date or date interval calculation.  You need a true date variable, but with a display format of YYQ6.  (1990Q1,  1990Q2, …).  Say you have data set HAVE with YEAR QTR WFICN1 and POST.  Then: 

 

data need (keep=wficn1 yearqtr_0);
  set have;
  by wficn1 post;
  if first.post=1 and post=1;
  DATE_POSTEQ1 = mdy(3*qtr,1,year);
  format DATE_POSTEQ1 YYQ6. ;
run;

data want;
  merge have  need (in=in_need);
  by wficn1;
  YEARQTR=mdy(3*qtr,1,year);
  format yearqtr YYQ6.;
  if in_need=1 then offset=intck('qtr',DATE_POSTEQ1,yearqtr);
  /* else offset=0;*/
run;

 

  1. The program creates data set NEED with 1 observation per WFICN1, corresponding to the first OBS with POST=1.  The "by wficn1 post;" statement tells SAS the data is sorted by WFICN1 POST  (also sorted by wficn1 year qtr, but that's not relevant here), and it also tells SAS to set a dummy FIRST.POST=1 whenever the value of post changes (i.e. the first POST=0 and the first POST=1).  Or when ever a by variable to its left (i.e. WFICN1) changes. 

  2. The IF statement without a THEN is a subsetting if, i.e. it's a filter.
  3. Data set need also creates a new date variable DATE_POSTEQ1.  It will be the first date with POST=1 for each WFICN1.  But even though it's a date, it will be displayed as 1990Q1 (01jan1990),  1990Q2 (01apr1990), etc.
  4. Then NEED and HAVE are merged by wficn1, which means the single record in NEED will be match-merged with all the corresponding records in HAVE (so the variable DATE_POSTEQ1 will be available in every obs).  As a result the INTCK function (counts various calendar intervals) will count the number of quarters from DATE_POSTEQ1 to the newly created YEARQTR, producing the new variable OFFSET.
  5. The IN_NEED is a temporary dummy variable indicating whether the merge-in-hand has input from data set NEED (wficn10001 does not, so IN_NEED=0 for it).  As a results OFFSET=. for all the 10001 records.  But if you insist on OFFSET=0 for that situation, then de-comment my ELSE … statement.  But remember each of the other ID's have one record with OFFSET=0.

 

 

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
aaronsterri
Calcite | Level 5


Thanks so much for your kind reply.

 

for some reason, i get the following error for the yearqtr variable.

ERROR 48-59: The format $YYQ was not found or could not be loaded.

 

any help is really appreciated. have a great night!

mkeintz
PROC Star

@aaronsterri wrote:


Thanks so much for your kind reply.

 

for some reason, i get the following error for the yearqtr variable.

ERROR 48-59: The format $YYQ was not found or could not be loaded.

 

any help is really appreciated. have a great night!


This is just a matter of paying close attention to the message, vs my suggestion.  In particular, you will not find a format $YYQ used in my code.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 964 views
  • 2 likes
  • 3 in conversation