BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nazmul
Quartz | Level 8

Hello Everone,

 

I am trying to count the frequency of each observation for the three consequtive years. The counting should consider the missing years also. I do not want to sum the frequency of three consecutive listed years instead I want the sum the frequency of three consecutive years. The count of the first observation of any id is equal to its own frequency and the count of the second consecutive observation of any id is equal to the sum of the frequency of the first and second consecutive years. Please help me.

 

input year id frequency;
1980 A 2
1982 A 1
1983 A 1
1986 A 2
1996 B 3
1997 B 2
1998 B 1
2001 B 5

 

The final output should look like the following:

 

Picture1.png

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You want to create a variable called COUNT - that much I can see.

 

The rules for creating COUNT are

  1. For first.id,  COUNT=frequency
  2. Otherwise COUNT for a given year is the sum of frequency for Year, Year-1, and Year-2, correct?

 

Let's say you know over all ID's,  your minyear=1980 and maxyear=2010.  If so, then make a array with bounds of 1978 (i.e. for 1980 minus 2 years) and 2010, as in:

 

data have;
  input year id $ frequency;
datalines;
1980 A 2
1982 A 1
1983 A 1
1986 A 2
1996 B 3
1997 B 2
1998 B 1
2001 B 5
run;


data want;
  set have;
  by id;
  array frq {1978:2010} _temporary_;
  if first.id then call missing (of frq{*});
  frq{year}=frequency;
  count=sum(frq{year-2},frq{year-1},frq{year});
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

9 REPLIES 9
Reeza
Super User

What do you have so far? 

 

RETAIN & LAG/DIF will be the key components in this solution. 

 

This doesn't make sense by the way, you don't want to sum the three consecutive years, but you do? I'm assuming it's just bad phrasing but you should clarify. 

I do not want to sum the frequency of three consecutive listed years instead I want the sum the frequency of three consecutive years.

nazmul
Quartz | Level 8

Dear Reeza,

 

I am sorry I could not explain the problem properly. Thank you so much for replying to my questI got my desired code from Mr. mkeintz

mkeintz
PROC Star

You want to create a variable called COUNT - that much I can see.

 

The rules for creating COUNT are

  1. For first.id,  COUNT=frequency
  2. Otherwise COUNT for a given year is the sum of frequency for Year, Year-1, and Year-2, correct?

 

Let's say you know over all ID's,  your minyear=1980 and maxyear=2010.  If so, then make a array with bounds of 1978 (i.e. for 1980 minus 2 years) and 2010, as in:

 

data have;
  input year id $ frequency;
datalines;
1980 A 2
1982 A 1
1983 A 1
1986 A 2
1996 B 3
1997 B 2
1998 B 1
2001 B 5
run;


data want;
  set have;
  by id;
  array frq {1978:2010} _temporary_;
  if first.id then call missing (of frq{*});
  frq{year}=frequency;
  count=sum(frq{year-2},frq{year-1},frq{year});
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
nazmul
Quartz | Level 8

Thank you Mr./Mrs. mkeintz. Your code perfectly solved my problem

Reeza
Super User

Does the case of data where you're missing the middle work as expected? 

 

Year count

1985 1 1

1987 3 4

1988 1 5?1???

nazmul
Quartz | Level 8

Dear Mr. Reeza,

 

I do not want to calculate the count in missing year. In my research, I am calculating the experience of bidders by counting the number of merger announcement they did in the last three years. 

Reeza
Super User

That's technically Mrs.

 

My question was does the answer above handle the situation above the way you expected. 

I don't understand your response. 

 


@nazmul wrote:

Dear Mr. Reeza,

 

I do not want to calculate the count in missing year. In my research, I am calculating the experience of bidders by counting the number of merger announcement they did in the last three years. 


 

nazmul
Quartz | Level 8
I only checked code mkeintz posted in this conversation. That code
perfectly solved my problem. Thank you for asking. I am sorry for Mr. in
place of Ms. Have a good day!

##- Please type your reply above this line. Simple formatting, no
attachments. -##
nazmul
Quartz | Level 8

Hi,

 

I am trying your code on my research. But the log file is showing the following errors. Could you please fix it? I will appreciate your help?

 

73 data want1;
74 set have1;
75 by ACUSIP;
76 array frq {1976:2017} _temporary_;
77 if first.ACUSIP then call missing (of frq{*});
78 frq{year}=frequency;
79 count=sum(frq{year-2},frq{year-1},frq{year});
80 run;

ERROR: Array subscript out of range at line 79 column 13.
year=1976 ACUSIP=46070010 frequency=1 FIRST.ACUSIP=1 LAST.ACUSIP=1 count=. _ERROR_=1 _N_=7175
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 7176 observations read from the data set WORK.HAVE1.
WARNING: The data set WORK.WANT1 may be incomplete. When this step was stopped there were 7174
observations and 4 variables.
WARNING: Data set WORK.WANT1 was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 973 views
  • 1 like
  • 3 in conversation