Solved: Re: How to merge to simiar code in SAS

Phil_NZ · Posted 07-18-2020 05:21 PM

Hi community,

I have two indistinguishable sets of code for solving data for two different variables (the codes for each set includes: proc sort, proc means, merge, proc rank, and trim data), the order of the lines of code and everything is similar between these two sets of codes (the difference is only the name of the variable).

I can get to the goal with these two sets of code, but since I saw the aesthetic SAS, I believe that there should be one way to merge these codes (especially they are resemblant to each other). I am wondering if we can merge these two sets of code together. I am wondering if you can suggest me the way that I can search or follow.

Thanks in advance.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

mkeintz · Posted 07-18-2020 07:00 PM

If you have a lot of common code preceding or following these two distinct data step, then you could embed them in a macro definition as below:

%macro mytask(dsn=);

  **** preceding common code here *****;

  %if &dsn=amihud %then %do;
    data amihud_;
      set finish_error;
      by Type;
      amihud=abs_r/trading_vol;
      year=year(date);
      if first.type then trading_vol=. and amihud=.;
    run;
  %end;
  %else %if &dsn=closing %then %do;
    data closing_;
      set finish_error;
      by Type;
      b=divide(pa_us-pb_us,mean_pa_pb);
      if b> 0.5*mean_pa_pb then b=.;
      year=year(date);
    run;
  %end;

  **** trailing common code here *****;

%mend mytask;

Note I assume that these two distinct steps occur in precisely the same relative position within the common code.

Then all you need to do is invoke the macro based on your needs. Below the macro is invoked twice, once for each distinct code:

%mytask(dsn=amihud);
%mytask(dsn=closing);

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

PaigeMiller · Posted 07-18-2020 05:56 PM

I'm not aware of any way to "merge code" other than by applying human intelligence and effort to the process.

--
Paige Miller

smantha · Posted 07-18-2020 06:01 PM

You would want to use a sas macro
%macro mycode(indsname, outdsname);
< your code>;
%mend mycode;
%mycode;
The thing you have to remember that the input to the first step would be indsname. For example if you use a data step as your first step then
Data new;
Set &indsname;
Run;
If your first step is a proc then
Proc <name of proc> data=&indsname.

On the same note for output if your last step is a datastep then
Data &outdsname.;
Set <some data set>;
Run;
If your last step is a proc then you specify the outdsname on out= line as shown below
out = &outdsname.

Kurt_Bremser · Posted 07-18-2020 06:11 PM

This might be a case for macro processing, but we will need to see those codes first.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Phil_NZ · Posted 07-18-2020 06:22 PM

Hi @Kurt_Bremser , one part of my code is as below:

options compress=yes reuse=yes;
data finish_error;
set 'D:\link_to_the_dataset';
run;
proc sort data=finish_error;
by Type date;
run;
data amihud_;
set finish_error;
by Type;
amihud=abs_r/trading_vol;
year=year(date);
if first.type then trading_vol=. and amihud=.;
run;
.
.
.
proc rank data=create_var groups=100 out=temp;
by year;
var lag_p_us;
ranks rank;
run;

data trim_price;
set temp;
if 0 < rank < 99;
run;
proc sort data=trim_price;
by Type;
run;
data amihud_final (drop= p_us lag_p_us rank rename=(obs=obs_amihud));
set trim_price;
run;

Thank you!

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

Phil_NZ · Posted 07-18-2020 06:26 PM

And this code is for the variable amihud, I have a similar code for another variable named "b". The code for dealing with them are similar, just different in calculation here:
for amihud:

data amihud_;
set finish_error;
by Type;
amihud=abs_r/trading_vol;
year=year(date);
if first.type then trading_vol=. and amihud=.;
run;

for b:

data closing_;
set finish_error;
by Type;
b=divide(pa_us-pb_us,mean_pa_pb);
if b> 0.5*mean_pa_pb then b=.;
year=year(date);
run;

Other codes in these two sets are indistinguishable.

Thanks!

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

Tom · Posted 07-18-2020 06:44 PM

Is the question how to create both variables in one data step?

If so it looks like it should be possible.

Note that your first data step has either a mistake or a very strange syntax that should probably be commented to explain to yourself what it means.

data amihud_closing;
  set finish_error;
  by Type;
  year=year(date);
* Calculate AMIHUD ;
  amihud=abs_r/trading_vol;
 * if first.type then trading_vol=. and amihud=.;
  if first.type then call missing(trading_vol,amihud);
* Calculate B ;
  b=divide(pa_us-pb_us,mean_pa_pb);
  if b> 0.5*mean_pa_pb then b=.;
run;

Phil_NZ · Posted 07-18-2020 07:05 PM

Hi @Tom , can you please quote the data step that you feel it is strange, I am going to have a look at it carefully. Thanks

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

Phil_NZ · Posted 07-18-2020 07:49 PM

Or do you mean this one?
data finish_error;
set 'D:\link_to_the_dataset';
run;

This is how I call my dataset using the SAS EG.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

Tom · Posted 07-19-2020 10:48 AM

@Phil_NZ wrote:

Hi @Tom , can you please quote the data step that you feel it is strange, I am going to have a look at it carefully. Thanks

I left the line in the code, only as a comment. This statement:

trading_vol=. and amihud=.;

is setting TRADING_VOL to zero. SAS evaluates boolean expressions as either zero or one and neither one will be considered equal to missing so the result is always zero.

I replaced it with:

call missing(trading_vol,amihud);

which will set both TRADING_VOL and AMIHUD to missing.

Phil_NZ · Posted 07-19-2020 03:53 PM

Awesome, thank you @Tom for this comprehensive explanation

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

mkeintz · Posted 07-18-2020 07:00 PM

If you have a lot of common code preceding or following these two distinct data step, then you could embed them in a macro definition as below:

%macro mytask(dsn=);

  **** preceding common code here *****;

  %if &dsn=amihud %then %do;
    data amihud_;
      set finish_error;
      by Type;
      amihud=abs_r/trading_vol;
      year=year(date);
      if first.type then trading_vol=. and amihud=.;
    run;
  %end;
  %else %if &dsn=closing %then %do;
    data closing_;
      set finish_error;
      by Type;
      b=divide(pa_us-pb_us,mean_pa_pb);
      if b> 0.5*mean_pa_pb then b=.;
      year=year(date);
    run;
  %end;

  **** trailing common code here *****;

%mend mytask;

Note I assume that these two distinct steps occur in precisely the same relative position within the common code.

Then all you need to do is invoke the macro based on your needs. Below the macro is invoked twice, once for each distinct code:

%mytask(dsn=amihud);
%mytask(dsn=closing);

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Register Today!

SAS Training: Just a Click Away