I would like to find out all sentences in financial reports that containing specific words for a sample of firms, and each firm has different searching words and different searching periods. The data structure is as follows:
No. Company word1 date1 date2
1 A blue 1998 2004
2 B apple 1999 2011
3 C cat 2000 2002
….
Because I would like to write sas macro code to search one firm each time, are there any efficient ways that allow me to set searching criteria (searching words and periods) each time?
For example for company A, one time search would be
%search(INSET=data1,OUTSET=data2,searchingwords=”blue”, company=A, begdate=1998, enddate=2004);
I would like to know if it is possible to write a general code that automatically changes its searching criteria when one firm is searched after another? I really appreciate your help!
First if you want to generate multiple lines using CALL EXECUTE() do it in the same data step!
Second if your macro is at all complicated you do not what to push unquoted %LET and macro calls onto the stack. The %LET and macro call will run when the code is pushed and the resulting SAS code is what it pulled back off. Instead of pulling off the actual %LET and macro call. Use %NRSTR() to delay those statement until they are actually run after the current data step.
Why are you making the macro variable anyway if all you going to use it for is to pass it to the macro?
inname=cats('x',n');
outname=cats('out',n);
call execute(catx(' ','data',inname,';set full_sample;'
,'where company=',quote(trim(company))
,'and year between',begdate,'and',enddate,';run;'
));
call execute(cats(%nrstr(%search)(inset=',inname,',outset=',outname,',searchwords=',searchingwords,');'));
The whole thing can be done with a single SQL query, no macro needed. Please provide example input and desired output datasets for more detailed answers.
Thank you very much for your quick response. Since I need to search in a certain data set, I am kindof required to use the macro "search" sas code provided by that data owner to search in the data set.
It sounds like you are asking to use the data you provided to generate a series of calls to a macro that you have not shown. Not clear if it is efficient at all since you haven't explained what the macro does. But to generate a macro call for every observation in a dataset is pretty easy. You could use CALL EXECUTE(), but I find it is easier to debug if you just write the code to a temporary file and then %include it.
So if your data looks like this:
data search_list;
input Company $ searchingwords $ begdate enddate;
cards;
A blue 1998 2004
B apple 1999 2011
C cat 2000 2002
;
Your program to generate the macro calls and then run them could look like this.
filename code temp;
data _null_;
set search_list;
file code;
put '%search(inset=data1,outset=data2,' searchingwords= ',' company= ',' begdate= ',' enddate= ')';
run;
%include code / source2;
Hi Tom, really appreciate your help! I checked the code, and find that the search process in the data set needs to 2 step: 1) get a subsample from the full data set by restricting to the firm and the time period; 2) use macro to parse out the lines including the searching word. For example, for company A, 1) step 1 generates a subset: data data1;set full_data;where company="A" and begdate<=date<=enddate;run; 2) step 2 parses out the relevant lines from data1 into data2: %search(inset=data1, outset=data2, searchingword="blue").
I do not know if the 2-step searching process can be performed using call execute. Also I would like to combine the output data2 for all observations into one file. I am not sure if this would impose more complications...
Now it sounds like you are asking how to create a macro?
The first step is to have working code and then replace the variable parts with macro parameters and/or macro logic.
Can you show how the working code and explain what parts need to vary and how those parts match to the metadata table you first showed?
Sounds like you want something like:
data want;
set have ;
where company="A" and year(date) between 2000 and 2015 ;
where also index(source_string,"target_string") ;
run;
Hi Tom, this is what I current do using the most naive to execute:
Step 1 generates the subsample x"n" (n refer to nth observation):
data null;
set search_list;
call execute ('data x'!!strip(n)!!',set full_sample;where company="'!!strip(company)!!'" and year between '!!strip(begdate)!!' and '!!strip(enddate)!!';run;');
run;
Step 2 based on x"n" from step 1, further generates out"n" for parsed lines;
data _null;
set search_list;
call execute(cats('%let string=',searchingwords,';%search(inset=x',n,',outset=ou',n,',searchwords=&string);'));
run;
Step 3 to combine all out"n" together for further processing.
The above are the very naive codes I currently use.
@dayuan wrote:
Hi Tom, really appreciate your help! I checked the code, and find that the search process in the data set needs to 2 step: 1) get a subsample from the full data set by restricting to the firm and the time period; 2) use macro to parse out the lines including the searching word. For example, for company A, 1) step 1 generates a subset: data data1;set full_data;where company="A" and begdate<=date<=enddate;run; 2) step 2 parses out the relevant lines from data1 into data2: %search(inset=data1, outset=data2, searchingword="blue").
I do not know if the 2-step searching process can be performed using call execute. Also I would like to combine the output data2 for all observations into one file. I am not sure if this would impose more complications...
We can't write the code for you here. You need to either write your code for a 'base case' ie non macro code that will do this and we can help you turn it into a macro. We don't know the steps or the data you're using. If you already have working code, post that and we can help turn it into a macro. Nothing from what you've indicated above would restrict you from using CALL EXECUTE. All CALL EXECUTE does is call the macro with the specific parameters, which we're assuming are in a data set somewhere, otherwise how do you know which words to search for which company.
If you want to try that yourself here's a tutorial:
https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md
Hi Reeza, I really appreciate your help! I currently used the following very naive code (this is my first time to use call function...) I am very thankful if you could kindly let me know any improvement that makes the code more efficient.
Step 1 generates the subsample x"n" (n refer to nth observation):
data null;
set search_list;
call execute ('data x'!!strip(n)!!',set full_sample;where company="'!!strip(company)!!'" and year between '!!strip(begdate)!!' and '!!strip(enddate)!!';run;');
run;
Step 2 based on x"n" from step 1, further generates out"n" for parsed lines;
data _null;
set search_list;
call execute(cats('%let string=',searchingwords,';%search(inset=x',n,',outset=ou',n,',searchwords=&string);'));
run;
Step 3 to combine all out"n" together for further processing.
Hi Reeza, following is the code for one case.
data date1;set full_sample;
if company="A" and begdate<=year<=enddate;
run;
%let string="blue";
%search(inset=data1, outset=data2, searchingword=&string);
@dayuan wrote:
Hi Reeza, following is the code for one case.
data date1;set full_sample;
if company="A" and begdate<=year<=enddate;
run;
%let string="blue";
%search(inset=data1, outset=data2, searchingword=&string);
If the macro supports dataset options you could do that in one call, no need to an extra data step.
%search
(inset=full_sample(where=(company="A" and begdate<=year<=enddate))
,outset=data2
,searchingword=blue
);
PS Why do you keep adding quotes into your macro variable for the search term? Does the macro expect the value to be inside of quotes?
First if you want to generate multiple lines using CALL EXECUTE() do it in the same data step!
Second if your macro is at all complicated you do not what to push unquoted %LET and macro calls onto the stack. The %LET and macro call will run when the code is pushed and the resulting SAS code is what it pulled back off. Instead of pulling off the actual %LET and macro call. Use %NRSTR() to delay those statement until they are actually run after the current data step.
Why are you making the macro variable anyway if all you going to use it for is to pass it to the macro?
inname=cats('x',n');
outname=cats('out',n);
call execute(catx(' ','data',inname,';set full_sample;'
,'where company=',quote(trim(company))
,'and year between',begdate,'and',enddate,';run;'
));
call execute(cats(%nrstr(%search)(inset=',inname,',outset=',outname,',searchwords=',searchingwords,');'));
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.