BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Andr
Calcite | Level 5

Hi guys,

I am new to SAS and for my univerity project, I have a big set of data with many obsevations and variables and I need to create some new variables.

 

DATA: I have a database of anayst forecasts, dates on which the forecast was published and firms to which the forecast was related.

 

Example. Analyst 234 forecasted on 01May90 that firm 234 will have a profit of 500USD.

 

What I need to compute: For each observation, I need to compute two new variables relative experience and relative firm experience.

 

Relative firm experience is computed as the number of years the analyst has been in the database at the time he published the particular forecast (time in full years since the first forecast of this analyst in the database till this forecast) minus the average experience of all all analyst who submited a forecast for this firm in this year.

 

Secondly, I have to compute the firm specific experience. That is virtually the same, but I need to use the number of years since the first forecast of the particular analyst for a particular firm (meaning time in full years since the first forecast of this analyst for this firm) again minus the average firm specific experience of all analysts who submited a forecast for that same firm in the same year.

 

I have tried many approaches but since I am not really familliar with SAS, I could't find a way to compute it and I would be really greatful for your help.

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Been in meetings most of the day.  This code merges on min dates for the various groups, you can use datediff function on these versus the main date to find how many years in each section, then run a sum of them per groups:

data have;
length forecast $ 6;
informat fdate mmddyy10.;
infile datalines missover;
input analyst firm forecast fdate;
format fdate mmddyy10.;
datalines;
01 23 4 01/03/1995
01 23 7 04/04/1997
01 24 400 03/04/1993
01 24 450 03/04/2000
01 24 300 11/05/2001
02 23 20 11/03/1994
02 23 2 01/03/1995
02 23 8 05/04/1997
02 24 420 04/04/1991
02 24 430 06/04/2000
02 24 310 07/05/2001
03 23 21 11/03/1992
03 23 5 01/03/1995
03 23 6 05/04/1997
03 24 420 04/04/1995
03 24 430 06/04/1998
03 24 310 06/05/2004
04 24 20 03/06/2007
04 24 24 12/12/2010
04 24 26 11/01/2011
;
run;

/* Get min date per anlyst, firm, anlyst/firm */
proc sql;
create table MIN_ANALYST as
select ANALYST,min(FDATE) as MIN_FDATE1
from HAVE
group by ANALYST;
create table MIN_FIRM as
select FIRM,min(FDATE) as MIN_FDATE2
from HAVE
group by FIRM;
create table MIN_FIRM_ANALYST as
select FIRM,ANALYST,min(FDATE) as MIN_FDATE3
from HAVE
group by FIRM,ANALYST;
quit;

/* Add these dates to main dataset */
proc sql;
create table INTER as
select A.*,
B.MIN_FDATE1 format=date9.,
C.MIN_FDATE2 format=date9.,
D.MIN_FDATE3 format=date9.,
datdif(B.MIN_FDATE1,A.FDATE,"ACT/ACT") / 365 as YRS1,
datdif(C.MIN_FDATE2,A.FDATE,"ACT/ACT") / 365 as YRS2,
datdif(D.MIN_FDATE3,A.FDATE,"ACT/ACT") / 365 as YRS3

from HAVE A
left join MIN_ANALYST B
on A.ANALYST=B.ANALYST
left join MIN_FIRM C
on A.FIRM=C.FIRM
left join MIN_FIRM_ANALYST D
on A.ANALYST=D.ANALYST and A.FIRM=D.FIRM;
quit;

/* So analyst years is */
proc sql;
create table WANT1 as
select ANALYST,sum(YRS1) as RESULT
from INTER
group by ANALYST;
quit;

View solution in original post

8 REPLIES 8
Reeza
Super User

Please post some sample data with your expected output. If your variables are being calculated in a single row it's a relatively straightforward problem, but hard  to see or explain without data.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Firstly, what is this "DATA&colon" - I have noticed this in a quite a few posts recently.  <- ignore this, its a forum bug

 

For your problem, can you provide some sample test data - in the form of a datastep, and what you want the output to look like.  Its easier to write code if it can be run.

Andr
Calcite | Level 5

OK, a simplified example of the data could look like this

 

***DATA STEP TO CREATE LABS; DATA forecasts;
LENGTH Forecast $ 6;
INFORMAT FDATE MMDDYY10.;
INFILE DATALINES MISSOVER; 
INPUT ANALYST FIRM FORECAST FDATE;
FORMAT DATE MMDDYY10.;
DATALINES;
01 23 4 01/03/1995
01 23 7 04/04/1997
01 24 400 03/04/1993
01 24 450 03/04/2000
01 24 300 11/05/2001
02 23 20 11/03/1994
02 23 2 01/03/1995
02 23 8 05/04/1997
02 24 420 04/04/1991
02 24 430 06/04/2000
02 24 310 07/05/2001
03 23 21 11/03/1992
03 23 5 01/03/1995
03 23 6 05/04/1997
03 24 420 04/04/1995
03 24 430 06/04/1998
03 24 310 06/05/2004
04 24 20 03/06/2007
04 24 24 12/12/2010 ;

 An example for relative general experience for the second line would be comuted like this: Analyst 01 submited his fist forecast in 1993 and therefore has been forecastin for 1993-1997 five years including the first one and has five years of general experience. At the same time, the oldest forecast for firm 23 was in 1995, therefore he has 3 years of firm specific experience.

 

I need to relate them to the other forecast. In 1997 also analysts 02 and 03 have submitted a forecast for firm 23. Analyst 04 hasn't, so I ignore him.

 

Their general experience (time form their first forecast in the dataset) is 1997-1991 + 1 = 7 years for analyst 02 and 1997-1992 + 1 = 6 for analyst 03. The average of all analysts is (5+7+6)/3=6 years and the relative general experience of analyst 01 is 5-6=-1.

 

Their firm specific experiences (time form their first forecast for firm 23 in the dataset) are 1997-1994 +1 = 4 for analyst 02 and 1997-1992 +1= 6 for analyst 03. The average of all analysts who submited a forecast for this firm in 1997 is (3+4+6)/3=4,33 and analyst 01 therefore has -1,33 years of relative firm specific experience.

 

I hope I explained it clear enough and you will be able to help me further. Thank you very much in advance!

 

 

 

 

ballardw
Super User

May have been a paste error but there isn't any explicit data for analyst 04 unless that is in the last line.

<Got it, not space between year and analyst>

However you say

Analyst 01 submited his fist forecast in 1993 and therefore has been forecastin for 1993-1997 five years including the first one and has five years of general experience. At the same time, the oldest forecast for firm 23 was in 1995, therefore he has 3 years of firm specific experience.

 

But analyst 1 example shows first forecast in 1993 and last in 2001, not 1997, which would be 9 years. Unless there's another rule (or did you add data to an already existing example text?)

Similar analyst 2 has forecasts from 1991-2001 not 1991 to 1997 as calculated in your example.

 

 

Andr
Calcite | Level 5

Here it is:

***DATA STEP TO CREATE LABS; DATA forecasts;
LENGTH Forecast $ 6;
INFORMAT FDATE MMDDYY10.;
INFILE DATALINES MISSOVER; 
INPUT ANALYST FIRM FORECAST FDATE;
FORMAT DATE MMDDYY10.;
DATALINES;
01 23 4 01/03/1995
01 23 7 04/04/1997
01 24 400 03/04/1993
01 24 450 03/04/2000
01 24 300 11/05/2001
02 23 20 11/03/1994
02 23 2 01/03/1995
02 23 8 05/04/1997
02 24 420 04/04/1991
02 24 430 06/04/2000
02 24 310 07/05/2001
03 23 21 11/03/1992
03 23 5 01/03/1995
03 23 6 05/04/1997
03 24 420 04/04/1995
03 24 430 06/04/1998
03 24 310 06/05/2004
04 24 20 03/06/2007
04 24 24 12/12/2010
04 24 26 11/01/2011
;
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Been in meetings most of the day.  This code merges on min dates for the various groups, you can use datediff function on these versus the main date to find how many years in each section, then run a sum of them per groups:

data have;
length forecast $ 6;
informat fdate mmddyy10.;
infile datalines missover;
input analyst firm forecast fdate;
format fdate mmddyy10.;
datalines;
01 23 4 01/03/1995
01 23 7 04/04/1997
01 24 400 03/04/1993
01 24 450 03/04/2000
01 24 300 11/05/2001
02 23 20 11/03/1994
02 23 2 01/03/1995
02 23 8 05/04/1997
02 24 420 04/04/1991
02 24 430 06/04/2000
02 24 310 07/05/2001
03 23 21 11/03/1992
03 23 5 01/03/1995
03 23 6 05/04/1997
03 24 420 04/04/1995
03 24 430 06/04/1998
03 24 310 06/05/2004
04 24 20 03/06/2007
04 24 24 12/12/2010
04 24 26 11/01/2011
;
run;

/* Get min date per anlyst, firm, anlyst/firm */
proc sql;
create table MIN_ANALYST as
select ANALYST,min(FDATE) as MIN_FDATE1
from HAVE
group by ANALYST;
create table MIN_FIRM as
select FIRM,min(FDATE) as MIN_FDATE2
from HAVE
group by FIRM;
create table MIN_FIRM_ANALYST as
select FIRM,ANALYST,min(FDATE) as MIN_FDATE3
from HAVE
group by FIRM,ANALYST;
quit;

/* Add these dates to main dataset */
proc sql;
create table INTER as
select A.*,
B.MIN_FDATE1 format=date9.,
C.MIN_FDATE2 format=date9.,
D.MIN_FDATE3 format=date9.,
datdif(B.MIN_FDATE1,A.FDATE,"ACT/ACT") / 365 as YRS1,
datdif(C.MIN_FDATE2,A.FDATE,"ACT/ACT") / 365 as YRS2,
datdif(D.MIN_FDATE3,A.FDATE,"ACT/ACT") / 365 as YRS3

from HAVE A
left join MIN_ANALYST B
on A.ANALYST=B.ANALYST
left join MIN_FIRM C
on A.FIRM=C.FIRM
left join MIN_FIRM_ANALYST D
on A.ANALYST=D.ANALYST and A.FIRM=D.FIRM;
quit;

/* So analyst years is */
proc sql;
create table WANT1 as
select ANALYST,sum(YRS1) as RESULT
from INTER
group by ANALYST;
quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 998 views
  • 2 likes
  • 4 in conversation