BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Phil_NZ
Barite | Level 11

Hi everybody, I am a novice in SAS, and this is the first time I post my question on our forum, so I am using SAS EG at this moment. My data is simply transposed already as below:
The first column is the label of the companies (character type) (I have around 10,000 companies sorted ascendingly), the second column is about the date (ddmmyyyy9.) (daily data from 1/11987 to 31/12/2019),
and the third column is a numeric variable named "return index" (R)accordingly.
My purpose is to set the variable"R" as missing if (1 + Ri,d) * (1+ Ri,d-1) less than 50% and at least either Ri,d or Ri,d-1 is greater than 100% while Ri,d is the return of stock i (column Type) on day d (column date)
I am wondering if you can help me to solve this problem or give a hint to deal with it by using SAS (version 9.4) or SAS EG(version 8.2- 64 bit) (my preference).

Type     Date               R

131712 01JAN1987    .
131712 02JAN1987    .
131712 05JAN1987    .
131712 06JAN1987    .
131712 07JAN1987    .
.
.
131712 31DEC2019   67
28829X 01JAN1987   89.5
28829X 02JAN1987   89.7
28829X 02JAN1987   90.66
28829X 02JAN1987   91.34
.
.
28829X 31DEC2019   85
.
Many thanks and cheers.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
1 ACCEPTED SOLUTION

Accepted Solutions
smantha
Lapis Lazuli | Level 10
data want;
set have;
by type;
array _date(11712) _temporary_;
array _r(11712) _temporary_;
retain _date: _r: start;
if first.type then do;
 call missing(of _date1-_date11712);
 call missing(of _r1-_r11712);
 start = 0;
end;
start = start+1;
_date[start]=date;
_r[start] = r;
if last.date then do;
 do i = 1 to start;
    if i > 1 then do;
        if _r[i]*_r[i-1] <= 50 and (_r[i] > 100 or _r[i-1] > 100) then do;
              r=.;
              date = _date[i];
              output;
         end;
         else do;
             r=_r[i];
              date = _date[i];
              output;
         end;
    end;
     else do;
          r=_r[i];
              date = _date[i];
              output;
     end;
 end; 
end;
run;

View solution in original post

16 REPLIES 16
Phil_NZ
Barite | Level 11

Sorry, it is my faults about the format and data:
- daily data from 1/1/1987 instead of 1/111987
- And the data is fixed as below (I changed the dates of companies 28829X)

Type Date R

131712 01JAN1987 .
131712 02JAN1987 .
131712 05JAN1987 .
131712 06JAN1987 .
131712 07JAN1987 .
.
.
131712 31DEC2019 67
28829X 01JAN1987 89.5
28829X 02JAN1987 89.7
28829X 05JAN1987 90.66
28829X 06JAN1987 91.34
.
.

From my point of view, I thought that I may set up a new column R_1 that equal to Ri.d-1* Ri,d, then I will set missing if R_1 is less than 50% and (Ri,d or Ri,d-1 less than 100%).
Do you think this solution makes sense, and if it is the case, I am wondering if you can suggest me the codes to do so.

 
Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
smantha
Lapis Lazuli | Level 10
proc sort data=have;
by type date;
run;

data want;
set have;
by type;
array _date(32*366) _temporary_;
array _r(32*366) _temporary_;
retain _date: _r: start;
if first.type then do;
 call missing(of _date:);
 call missing(of _r:);
 start = 0;
end;
start = start+1;
_date[start]=date;
_r[start] = r;
if last.date then do;
 do i = 1 to start;
    if i > 1 then do;
        if _r[i]*_r[i-1] < 50 and (r[i] < 100 or r[i-1] <100) then do;
              r=.;
              date = date[i];
              output;
         end;
         else do;
             r=r[i];
              date = date[i];
              output;
         end;
    end;
     else do;
          r=r[i];
              date = date[i];
              output;
     end;
 end; 
end;
run;
Phil_NZ
Barite | Level 11

Hi @smantha , thank you very much for your help!

 

Unfortunately, when I put the codes into SAS, it results in some errors like that in my log, can you please have a look at it? Thanks

 

95         data want;
96         set have;
97         by type;
98         array _date(32*366) _temporary_;
                         _
                         22
                         200
99         array _r(32*366) _temporary_;
                      _
                      22
                      200
ERROR 22-322: Syntax error, expecting one of the following: ), ',', :.  

ERROR 200-322: The symbol is not recognized and will be ignored.

100        retain _date: _r: start;
101        if first.type then do;
102         call missing(of _date:);
                 _______
                 252
103         call missing(of _r:);
                 _______
                 252
ERROR 252-185: The MISSING subroutine call does not have enough arguments.

104         start = 0;
105        end;
106        start = start+1;
107        _date[start]=date;
108        _r[start] = r;
109        if last.date then do;
110         do i = 1 to start;
111            if i > 1 then do;
112                if _r[i]*_r[i-1] < 50 and (r[i] < 100 or r[i-1] <100) then do;
ERROR: Undeclared array referenced: r.
ERROR: Variable r has not been declared as an array.
ERROR: Undeclared array referenced: r.
ERROR: Variable r has not been declared as an array.
113                      r=.;
114                      date = date[i];
ERROR: Undeclared array referenced: date.
ERROR: Variable date has not been declared as an array.
115                      output;
116                 end;
117                 else do;
118                     r=r[i];
ERROR: Undeclared array referenced: r.
ERROR: Variable r has not been declared as an array.
119                      date = date[i];
ERROR: Undeclared array referenced: date.
ERROR: Variable date has not been declared as an array.
120                      output;
121                 end;
122            end;
123             else do;
124                  r=r[i];
ERROR: Undeclared array referenced: r.
ERROR: Variable r has not been declared as an array.
125                      date = date[i];
ERROR: Undeclared array referenced: date.
ERROR: Variable date has not been declared as an array.
126                      output;
127             end;
128         end;
129        end;
130        run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WANT may be incomplete.  When this step was stopped there were 0 observations and 14 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
smantha
Lapis Lazuli | Level 10
Instead of 32*36 use 11712. Sorry my bad.
smantha
Lapis Lazuli | Level 10
You can try using lag function which can be simpler.
Data want;
Set have;
By type;
Lag_r =lag(R);
If lag_r*r < 50 and (lag_r<100 or r<100) then r1=.;
Else r1=r;
Run;
Phil_NZ
Barite | Level 11

Hi @smantha

Thank you for your simpler version, but I think with this version, we can not separate the return(R) of different companies, am I correct? Because there would be one case that we multiple r of company 1 in 31/12/2019 with r of company 2 in 1/1/1987. Do you have any solution for it, I am really into this simple solution but it seems that it has not yet been completed.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
Tom
Super User Tom
Super User

@Phil_NZ wrote:

Hi @smantha

Thank you for your simpler version, but I think with this version, we can not separate the return(R) of different companies, am I correct? Because there would be one case that we multiple r of company 1 in 31/12/2019 with r of company 2 in 1/1/1987. Do you have any solution for it, I am really into this simple solution but it seems that it has not yet been completed.


If the data is grouped then use BY group processing.  The code had a BY statement, but didn't use it for anything.  So reset the new LAG_R value to missing when you start a new group.  Since missing is less than any actual number that will mean the IF test will succeed and R1 will be missing for the first observation of each group.   If you want something else for R1 for the first observation then change the logic to reflect that.

data want;
  set have;
  by type;
  lag_r =lag(R);
  if first.type then lag_r=.;
  if lag_r*r < 50 and (lag_r<100 or r<100) then r1=.;
  else r1=r;
run;

 

Phil_NZ
Barite | Level 11

Hi @smantha

Thank you for your correction, but when I change the number to 11712, there is another error emerged as below, can you please help me to have a look at it. Many thanks and cheers.

data want;
set have;
by type;
array _date(11712) _temporary_;
array _r(11712) _temporary_;
retain _date: _r: start;
if first.type then do;
 call missing(of _date:);
 call missing(of _r:);
 start = 0;
end;
start = start+1;
_date[start]=date;
_r[start] = r;
if last.date then do;
 do i = 1 to start;
    if i > 1 then do;
        if _r[i]*_r[i-1] <= 50 and (r[i] > 100 or r[i-1] > 100) then do;
              r=.;
              date = date[i];
              output;
         end;
         else do;
             r=r[i];
              date = date[i];
              output;
         end;
    end;
     else do;
          r=r[i];
              date = date[i];
              output;
     end;
 end; 
end;
run;


28 data want; 29 set have; 30 by type; 31 array _date(11712) _temporary_; 32 array _r(11712) _temporary_; 33 retain _date: _r: start; 34 if first.type then do; 35 call missing(of _date:); _______ 252 36 call missing(of _r:); _______ 252 ERROR 252-185: The MISSING subroutine call does not have enough arguments. 37 start = 0; 38 end; 39 start = start+1; 40 _date[start]=date; 41 _r[start] = r; 42 if last.date then do; 43 do i = 1 to start; 44 if i > 1 then do; 45 if _r[i]*_r[i-1] <= 50 and (r[i] > 100 or r[i-1] > 100) then do; ERROR: Undeclared array referenced: r. ERROR: Variable r has not been declared as an array. ERROR: Undeclared array referenced: r. ERROR: Variable r has not been declared as an array. 46 r=.; 47 date = date[i]; 2 The SAS System 21:50 Thursday, July 9, 2020 ERROR: Undeclared array referenced: date. ERROR: Variable date has not been declared as an array. 48 output; 49 end; 50 else do; 51 r=r[i]; ERROR: Undeclared array referenced: r. ERROR: Variable r has not been declared as an array. 52 date = date[i]; ERROR: Undeclared array referenced: date. ERROR: Variable date has not been declared as an array. 53 output; 54 end; 55 end; 56 else do; 57 r=r[i]; ERROR: Undeclared array referenced: r. ERROR: Variable r has not been declared as an array. 58 date = date[i]; ERROR: Undeclared array referenced: date. ERROR: Variable date has not been declared as an array. 59 output; 60 end; 61 end; 62 end; 63 64 %LET _CLIENTTASKLABEL=; 65 %LET _CLIENTPROCESSFLOWNAME=; 66 %LET _CLIENTPROJECTPATH=; 67 %LET _CLIENTPROJECTPATHHOST=; 68 %LET _CLIENTPROJECTNAME=; 69 %LET _SASPROGRAMFILE=; 70 %LET _SASPROGRAMFILEHOST=; 71 72 ;*';*";*/;quit;run; ____ 180 ERROR 180-322: Statement is not valid or it is used out of proper order. NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 14 variables. WARNING: Data set WORK.WANT was not replaced because this step was stopped. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds 73 ODS _ALL_ CLOSE; 74 75 76 QUIT; RUN; 77
 
Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
Kurt_Bremser
Super User

Testing revealed that you need to call the variables as a variable list:

if first.type then do;
 call missing(of _date1-_date11712);
 call missing(of _r1-_r11712);
 start = 0;
end;
FreelanceReinh
Jade | Level 19

@smantha and @Kurt_Bremser:


call missing(of _date1-_date11712);
call missing(of _r1-_r11712);

This would create a lot of new numeric variables while leaving the values of the temporary array elements unchanged. To set the array elements to missing I would use:

call missing(of _date[*]);
call missing(of _r[*]);

Similarly, _date: and _r: in the RETAIN statement does not refer to the array elements, but to non-existing variables, while the temporary array elements are retained by default anyway.

smantha
Lapis Lazuli | Level 10
data want;
set have;
by type;
array _date(11712) _temporary_;
array _r(11712) _temporary_;
retain _date: _r: start;
if first.type then do;
 call missing(of _date1-_date11712);
 call missing(of _r1-_r11712);
 start = 0;
end;
start = start+1;
_date[start]=date;
_r[start] = r;
if last.date then do;
 do i = 1 to start;
    if i > 1 then do;
        if _r[i]*_r[i-1] <= 50 and (_r[i] > 100 or _r[i-1] > 100) then do;
              r=.;
              date = _date[i];
              output;
         end;
         else do;
             r=_r[i];
              date = _date[i];
              output;
         end;
    end;
     else do;
          r=_r[i];
              date = _date[i];
              output;
     end;
 end; 
end;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 1143 views
  • 7 likes
  • 5 in conversation