DATA Step, Macro, Functions and more

Counting the frequency of three consecutive yrs with first.obs taking the frequency as its count

Accepted Solution Solved
Reply
Contributor
Posts: 66
Accepted Solution

Counting the frequency of three consecutive yrs with first.obs taking the frequency as its count

Hello Everone,

 

I am trying to count the frequency of each observation for the three consequtive years. The counting should consider the missing years also. I do not want to sum the frequency of three consecutive listed years instead I want the sum the frequency of three consecutive years. The count of the first observation of any id is equal to its own frequency and the count of the second consecutive observation of any id is equal to the sum of the frequency of the first and second consecutive years. Please help me.

 

input year id frequency;
1980 A 2
1982 A 1
1983 A 1
1986 A 2
1996 B 3
1997 B 2
1998 B 1
2001 B 5

 

The final output should look like the following:

 

Picture1.png


Accepted Solutions
Solution
‎05-29-2017 09:23 PM
Trusted Advisor
Posts: 1,019

Re: Counting the frequency of three consecutive yrs with first.obs taking the frequency as its count

You want to create a variable called COUNT - that much I can see.

 

The rules for creating COUNT are

  1. For first.id,  COUNT=frequency
  2. Otherwise COUNT for a given year is the sum of frequency for Year, Year-1, and Year-2, correct?

 

Let's say you know over all ID's,  your minyear=1980 and maxyear=2010.  If so, then make a array with bounds of 1978 (i.e. for 1980 minus 2 years) and 2010, as in:

 

data have;
  input year id $ frequency;
datalines;
1980 A 2
1982 A 1
1983 A 1
1986 A 2
1996 B 3
1997 B 2
1998 B 1
2001 B 5
run;


data want;
  set have;
  by id;
  array frq {1978:2010} _temporary_;
  if first.id then call missing (of frq{*});
  frq{year}=frequency;
  count=sum(frq{year-2},frq{year-1},frq{year});
run;

View solution in original post


All Replies
Super User
Posts: 19,791

Re: Counting the frequency of three consecutive yrs with first.obs taking the frequency as its count

What do you have so far? 

 

RETAIN & LAG/DIF will be the key components in this solution. 

 

This doesn't make sense by the way, you don't want to sum the three consecutive years, but you do? I'm assuming it's just bad phrasing but you should clarify. 

I do not want to sum the frequency of three consecutive listed years instead I want the sum the frequency of three consecutive years.

Contributor
Posts: 66

Re: Counting the frequency of three consecutive yrs with first.obs taking the frequency as its count

Dear Reeza,

 

I am sorry I could not explain the problem properly. Thank you so much for replying to my questI got my desired code from Mr. mkeintz

Solution
‎05-29-2017 09:23 PM
Trusted Advisor
Posts: 1,019

Re: Counting the frequency of three consecutive yrs with first.obs taking the frequency as its count

You want to create a variable called COUNT - that much I can see.

 

The rules for creating COUNT are

  1. For first.id,  COUNT=frequency
  2. Otherwise COUNT for a given year is the sum of frequency for Year, Year-1, and Year-2, correct?

 

Let's say you know over all ID's,  your minyear=1980 and maxyear=2010.  If so, then make a array with bounds of 1978 (i.e. for 1980 minus 2 years) and 2010, as in:

 

data have;
  input year id $ frequency;
datalines;
1980 A 2
1982 A 1
1983 A 1
1986 A 2
1996 B 3
1997 B 2
1998 B 1
2001 B 5
run;


data want;
  set have;
  by id;
  array frq {1978:2010} _temporary_;
  if first.id then call missing (of frq{*});
  frq{year}=frequency;
  count=sum(frq{year-2},frq{year-1},frq{year});
run;
Contributor
Posts: 66

Re: Counting the frequency of three consecutive yrs with first.obs taking the frequency as its count

Thank you Mr./Mrs. mkeintz. Your code perfectly solved my problem

Super User
Posts: 19,791

Re: Counting the frequency of three consecutive yrs with first.obs taking the frequency as its count

Does the case of data where you're missing the middle work as expected? 

 

Year count

1985 1 1

1987 3 4

1988 1 5?1???

Contributor
Posts: 66

Re: Counting the frequency of three consecutive yrs with first.obs taking the frequency as its count

Dear Mr. Reeza,

 

I do not want to calculate the count in missing year. In my research, I am calculating the experience of bidders by counting the number of merger announcement they did in the last three years. 

Super User
Posts: 19,791

Re: Counting the frequency of three consecutive yrs with first.obs taking the frequency as its count

That's technically Mrs.

 

My question was does the answer above handle the situation above the way you expected. 

I don't understand your response. 

 


nazmul wrote:

Dear Mr. Reeza,

 

I do not want to calculate the count in missing year. In my research, I am calculating the experience of bidders by counting the number of merger announcement they did in the last three years. 


 

Contributor
Posts: 66

Re: Counting the frequency of three consecutive yrs with first.obs taking the frequency as its count

I only checked code mkeintz posted in this conversation. That code
perfectly solved my problem. Thank you for asking. I am sorry for Mr. in
place of Ms. Have a good day!

##- Please type your reply above this line. Simple formatting, no
attachments. -##
Contributor
Posts: 66

Re: Counting the frequency of three consecutive yrs with first.obs taking the frequency as its count

Hi,

 

I am trying your code on my research. But the log file is showing the following errors. Could you please fix it? I will appreciate your help?

 

73 data want1;
74 set have1;
75 by ACUSIP;
76 array frq {1976:2017} _temporary_;
77 if first.ACUSIP then call missing (of frq{*});
78 frq{year}=frequency;
79 count=sum(frq{year-2},frq{year-1},frq{year});
80 run;

ERROR: Array subscript out of range at line 79 column 13.
year=1976 ACUSIP=46070010 frequency=1 FIRST.ACUSIP=1 LAST.ACUSIP=1 count=. _ERROR_=1 _N_=7175
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 7176 observations read from the data set WORK.HAVE1.
WARNING: The data set WORK.WANT1 may be incomplete. When this step was stopped there were 7174
observations and 4 variables.
WARNING: Data set WORK.WANT1 was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 193 views
  • 1 like
  • 3 in conversation