Hello Everone,
I am trying to count the frequency of each observation for the three consequtive years. The counting should consider the missing years also. I do not want to sum the frequency of three consecutive listed years instead I want the sum the frequency of three consecutive years. The count of the first observation of any id is equal to its own frequency and the count of the second consecutive observation of any id is equal to the sum of the frequency of the first and second consecutive years. Please help me.
input year id frequency;
1980 A 2
1982 A 1
1983 A 1
1986 A 2
1996 B 3
1997 B 2
1998 B 1
2001 B 5
The final output should look like the following:
You want to create a variable called COUNT - that much I can see.
The rules for creating COUNT are
Let's say you know over all ID's, your minyear=1980 and maxyear=2010. If so, then make a array with bounds of 1978 (i.e. for 1980 minus 2 years) and 2010, as in:
data have;
input year id $ frequency;
datalines;
1980 A 2
1982 A 1
1983 A 1
1986 A 2
1996 B 3
1997 B 2
1998 B 1
2001 B 5
run;
data want;
set have;
by id;
array frq {1978:2010} _temporary_;
if first.id then call missing (of frq{*});
frq{year}=frequency;
count=sum(frq{year-2},frq{year-1},frq{year});
run;
What do you have so far?
RETAIN & LAG/DIF will be the key components in this solution.
This doesn't make sense by the way, you don't want to sum the three consecutive years, but you do? I'm assuming it's just bad phrasing but you should clarify.
I do not want to sum the frequency of three consecutive listed years instead I want the sum the frequency of three consecutive years.
Dear Reeza,
I am sorry I could not explain the problem properly. Thank you so much for replying to my questI got my desired code from Mr. mkeintz
You want to create a variable called COUNT - that much I can see.
The rules for creating COUNT are
Let's say you know over all ID's, your minyear=1980 and maxyear=2010. If so, then make a array with bounds of 1978 (i.e. for 1980 minus 2 years) and 2010, as in:
data have;
input year id $ frequency;
datalines;
1980 A 2
1982 A 1
1983 A 1
1986 A 2
1996 B 3
1997 B 2
1998 B 1
2001 B 5
run;
data want;
set have;
by id;
array frq {1978:2010} _temporary_;
if first.id then call missing (of frq{*});
frq{year}=frequency;
count=sum(frq{year-2},frq{year-1},frq{year});
run;
Thank you Mr./Mrs. mkeintz. Your code perfectly solved my problem
Does the case of data where you're missing the middle work as expected?
Year count
1985 1 1
1987 3 4
1988 1 5?1???
Dear Mr. Reeza,
I do not want to calculate the count in missing year. In my research, I am calculating the experience of bidders by counting the number of merger announcement they did in the last three years.
That's technically Mrs.
My question was does the answer above handle the situation above the way you expected.
I don't understand your response.
@nazmul wrote:
Dear Mr. Reeza,
I do not want to calculate the count in missing year. In my research, I am calculating the experience of bidders by counting the number of merger announcement they did in the last three years.
Hi,
I am trying your code on my research. But the log file is showing the following errors. Could you please fix it? I will appreciate your help?
73 data want1;
74 set have1;
75 by ACUSIP;
76 array frq {1976:2017} _temporary_;
77 if first.ACUSIP then call missing (of frq{*});
78 frq{year}=frequency;
79 count=sum(frq{year-2},frq{year-1},frq{year});
80 run;
ERROR: Array subscript out of range at line 79 column 13.
year=1976 ACUSIP=46070010 frequency=1 FIRST.ACUSIP=1 LAST.ACUSIP=1 count=. _ERROR_=1 _N_=7175
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 7176 observations read from the data set WORK.HAVE1.
WARNING: The data set WORK.WANT1 may be incomplete. When this step was stopped there were 7174
observations and 4 variables.
WARNING: Data set WORK.WANT1 was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.