Hi everybody, I am a novice in SAS, and this is the first time I post my question on our forum, so I am using SAS EG at this moment. My data is simply transposed already as below:
The first column is the label of the companies (character type) (I have around 10,000 companies sorted ascendingly), the second column is about the date (ddmmyyyy9.) (daily data from 1/11987 to 31/12/2019),
and the third column is a numeric variable named "return index" (R)accordingly.
My purpose is to set the variable"R" as missing if (1 + Ri,d) * (1+ Ri,d-1) less than 50% and at least either Ri,d or Ri,d-1 is greater than 100% while Ri,d is the return of stock i (column Type) on day d (column date)
I am wondering if you can help me to solve this problem or give a hint to deal with it by using SAS (version 9.4) or SAS EG(version 8.2- 64 bit) (my preference).
Type Date R
131712 01JAN1987 .
131712 02JAN1987 .
131712 05JAN1987 .
131712 06JAN1987 .
131712 07JAN1987 .
.
.
131712 31DEC2019 67
28829X 01JAN1987 89.5
28829X 02JAN1987 89.7
28829X 02JAN1987 90.66
28829X 02JAN1987 91.34
.
.
28829X 31DEC2019 85
.
Many thanks and cheers.
data want;
set have;
by type;
array _date(11712) _temporary_;
array _r(11712) _temporary_;
retain _date: _r: start;
if first.type then do;
call missing(of _date1-_date11712);
call missing(of _r1-_r11712);
start = 0;
end;
start = start+1;
_date[start]=date;
_r[start] = r;
if last.date then do;
do i = 1 to start;
if i > 1 then do;
if _r[i]*_r[i-1] <= 50 and (_r[i] > 100 or _r[i-1] > 100) then do;
r=.;
date = _date[i];
output;
end;
else do;
r=_r[i];
date = _date[i];
output;
end;
end;
else do;
r=_r[i];
date = _date[i];
output;
end;
end;
end;
run;
Sorry, it is my faults about the format and data:
- daily data from 1/1/1987 instead of 1/111987
- And the data is fixed as below (I changed the dates of companies 28829X)
Type Date R
131712 01JAN1987 .
131712 02JAN1987 .
131712 05JAN1987 .
131712 06JAN1987 .
131712 07JAN1987 .
.
.
131712 31DEC2019 67
28829X 01JAN1987 89.5
28829X 02JAN1987 89.7
28829X 05JAN1987 90.66
28829X 06JAN1987 91.34
.
.
From my point of view, I thought that I may set up a new column R_1 that equal to Ri.d-1* Ri,d, then I will set missing if R_1 is less than 50% and (Ri,d or Ri,d-1 less than 100%).
Do you think this solution makes sense, and if it is the case, I am wondering if you can suggest me the codes to do so.
proc sort data=have;
by type date;
run;
data want;
set have;
by type;
array _date(32*366) _temporary_;
array _r(32*366) _temporary_;
retain _date: _r: start;
if first.type then do;
call missing(of _date:);
call missing(of _r:);
start = 0;
end;
start = start+1;
_date[start]=date;
_r[start] = r;
if last.date then do;
do i = 1 to start;
if i > 1 then do;
if _r[i]*_r[i-1] < 50 and (r[i] < 100 or r[i-1] <100) then do;
r=.;
date = date[i];
output;
end;
else do;
r=r[i];
date = date[i];
output;
end;
end;
else do;
r=r[i];
date = date[i];
output;
end;
end;
end;
run;
Hi @smantha , thank you very much for your help!
Unfortunately, when I put the codes into SAS, it results in some errors like that in my log, can you please have a look at it? Thanks
95 data want;
96 set have;
97 by type;
98 array _date(32*366) _temporary_;
_
22
200
99 array _r(32*366) _temporary_;
_
22
200
ERROR 22-322: Syntax error, expecting one of the following: ), ',', :.
ERROR 200-322: The symbol is not recognized and will be ignored.
100 retain _date: _r: start;
101 if first.type then do;
102 call missing(of _date:);
_______
252
103 call missing(of _r:);
_______
252
ERROR 252-185: The MISSING subroutine call does not have enough arguments.
104 start = 0;
105 end;
106 start = start+1;
107 _date[start]=date;
108 _r[start] = r;
109 if last.date then do;
110 do i = 1 to start;
111 if i > 1 then do;
112 if _r[i]*_r[i-1] < 50 and (r[i] < 100 or r[i-1] <100) then do;
ERROR: Undeclared array referenced: r.
ERROR: Variable r has not been declared as an array.
ERROR: Undeclared array referenced: r.
ERROR: Variable r has not been declared as an array.
113 r=.;
114 date = date[i];
ERROR: Undeclared array referenced: date.
ERROR: Variable date has not been declared as an array.
115 output;
116 end;
117 else do;
118 r=r[i];
ERROR: Undeclared array referenced: r.
ERROR: Variable r has not been declared as an array.
119 date = date[i];
ERROR: Undeclared array referenced: date.
ERROR: Variable date has not been declared as an array.
120 output;
121 end;
122 end;
123 else do;
124 r=r[i];
ERROR: Undeclared array referenced: r.
ERROR: Variable r has not been declared as an array.
125 date = date[i];
ERROR: Undeclared array referenced: date.
ERROR: Variable date has not been declared as an array.
126 output;
127 end;
128 end;
129 end;
130 run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 14 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
Hi @smantha
Thank you for your simpler version, but I think with this version, we can not separate the return(R) of different companies, am I correct? Because there would be one case that we multiple r of company 1 in 31/12/2019 with r of company 2 in 1/1/1987. Do you have any solution for it, I am really into this simple solution but it seems that it has not yet been completed.
@Phil_NZ wrote:
Hi @smantha
Thank you for your simpler version, but I think with this version, we can not separate the return(R) of different companies, am I correct? Because there would be one case that we multiple r of company 1 in 31/12/2019 with r of company 2 in 1/1/1987. Do you have any solution for it, I am really into this simple solution but it seems that it has not yet been completed.
If the data is grouped then use BY group processing. The code had a BY statement, but didn't use it for anything. So reset the new LAG_R value to missing when you start a new group. Since missing is less than any actual number that will mean the IF test will succeed and R1 will be missing for the first observation of each group. If you want something else for R1 for the first observation then change the logic to reflect that.
data want;
set have;
by type;
lag_r =lag(R);
if first.type then lag_r=.;
if lag_r*r < 50 and (lag_r<100 or r<100) then r1=.;
else r1=r;
run;
If you want to use a calculated value (to make the calculation obvious for the next developer), use a macro function:
array _date{%eval(32*366)} _temporary_;
Hi @smantha
Thank you for your correction, but when I change the number to 11712, there is another error emerged as below, can you please help me to have a look at it. Many thanks and cheers.
data want;
set have;
by type;
array _date(11712) _temporary_;
array _r(11712) _temporary_;
retain _date: _r: start;
if first.type then do;
call missing(of _date:);
call missing(of _r:);
start = 0;
end;
start = start+1;
_date[start]=date;
_r[start] = r;
if last.date then do;
do i = 1 to start;
if i > 1 then do;
if _r[i]*_r[i-1] <= 50 and (r[i] > 100 or r[i-1] > 100) then do;
r=.;
date = date[i];
output;
end;
else do;
r=r[i];
date = date[i];
output;
end;
end;
else do;
r=r[i];
date = date[i];
output;
end;
end;
end;
run;
28 data want;
29 set have;
30 by type;
31 array _date(11712) _temporary_;
32 array _r(11712) _temporary_;
33 retain _date: _r: start;
34 if first.type then do;
35 call missing(of _date:);
_______
252
36 call missing(of _r:);
_______
252
ERROR 252-185: The MISSING subroutine call does not have enough arguments.
37 start = 0;
38 end;
39 start = start+1;
40 _date[start]=date;
41 _r[start] = r;
42 if last.date then do;
43 do i = 1 to start;
44 if i > 1 then do;
45 if _r[i]*_r[i-1] <= 50 and (r[i] > 100 or r[i-1] > 100) then do;
ERROR: Undeclared array referenced: r.
ERROR: Variable r has not been declared as an array.
ERROR: Undeclared array referenced: r.
ERROR: Variable r has not been declared as an array.
46 r=.;
47 date = date[i];
2 The SAS System 21:50 Thursday, July 9, 2020
ERROR: Undeclared array referenced: date.
ERROR: Variable date has not been declared as an array.
48 output;
49 end;
50 else do;
51 r=r[i];
ERROR: Undeclared array referenced: r.
ERROR: Variable r has not been declared as an array.
52 date = date[i];
ERROR: Undeclared array referenced: date.
ERROR: Variable date has not been declared as an array.
53 output;
54 end;
55 end;
56 else do;
57 r=r[i];
ERROR: Undeclared array referenced: r.
ERROR: Variable r has not been declared as an array.
58 date = date[i];
ERROR: Undeclared array referenced: date.
ERROR: Variable date has not been declared as an array.
59 output;
60 end;
61 end;
62 end;
63
64 %LET _CLIENTTASKLABEL=;
65 %LET _CLIENTPROCESSFLOWNAME=;
66 %LET _CLIENTPROJECTPATH=;
67 %LET _CLIENTPROJECTPATHHOST=;
68 %LET _CLIENTPROJECTNAME=;
69 %LET _SASPROGRAMFILE=;
70 %LET _SASPROGRAMFILEHOST=;
71
72 ;*';*";*/;quit;run;
____
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 14 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
73 ODS _ALL_ CLOSE;
74
75
76 QUIT; RUN;
77
Testing revealed that you need to call the variables as a variable list:
if first.type then do;
call missing(of _date1-_date11712);
call missing(of _r1-_r11712);
start = 0;
end;
@smantha and @Kurt_Bremser:
call missing(of _date1-_date11712); call missing(of _r1-_r11712);
This would create a lot of new numeric variables while leaving the values of the temporary array elements unchanged. To set the array elements to missing I would use:
call missing(of _date[*]); call missing(of _r[*]);
Similarly, _date: and _r: in the RETAIN statement does not refer to the array elements, but to non-existing variables, while the temporary array elements are retained by default anyway.
You also need to make sure that all array references are done with the correct names (leading underlines!).
data want;
set have;
by type;
array _date(11712) _temporary_;
array _r(11712) _temporary_;
retain _date: _r: start;
if first.type then do;
call missing(of _date1-_date11712);
call missing(of _r1-_r11712);
start = 0;
end;
start = start+1;
_date[start]=date;
_r[start] = r;
if last.date then do;
do i = 1 to start;
if i > 1 then do;
if _r[i]*_r[i-1] <= 50 and (_r[i] > 100 or _r[i-1] > 100) then do;
r=.;
date = _date[i];
output;
end;
else do;
r=_r[i];
date = _date[i];
output;
end;
end;
else do;
r=_r[i];
date = _date[i];
output;
end;
end;
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.