BookmarkSubscribeRSS Feed
sonikm24
Calcite | Level 5

Hello,

           I have using a web crawler to find multiple instances of keywords (in the code I am searching for "Notional") from sec files (see the code attached). I am using the prxnext function to do the job. But I am having a problem when I am trying to output lines surrounding the keywords. I am trying to increase the output lines for every instance of the keyword in the sec file. E.g. if there are 5 instances of "Notional" in the sec file, i am trying to output lines surrounding each one of the instances of the keyword. In the code, I am using the following lines of code for that purpose:

     if (0 < countC2 <= 10) then do;

            output;

            end;

But this code is not able to increase or decrease the output lines surrounding the keywords even by changing 10 to 15 or 5. Can anyone help with the issue? I have attached the code and a sample excel file.

Thanks.

Sonik Mandal

4 REPLIES 4
Reeza
Super User

If I understand your problem, which I'm not sure I do, you can't simply change a single parameter in the code you have to get extra lines.

SAS processes data lines by line, so its more complex than that.

I don't usually say this, but I question whether SAS is the best job for this type of work. Not that it can't be done, more of a should it.

The Kimono interface is fairly good:

the kimono blog

Peter_C
Rhodochrosite | Level 12

@sonikm24

Some time ago it was important to highlight issues relevant to y2k compliance. To show the context of issues my code buffered program lines in blocks controlled by a macro var (I started with 3 but client needed 5). The code used ARRAYs to buffer the lines of code. You might have a similar concern that there are multiple strings to target and these must be allowed to overlap.

the code was not concise.

best of luck with your challenge

peterC

Astounding
PROC Star

It looks like you already have a SAS data set by the time you search for NOTIONAL.  In that case, finding 5 lines doesn't have to be terribly difficult.  You might have decisions to make if you find NOTIONAL on the first line (for example) ... this solution would take a maximum of 5 lines:  the line itself, plus 2 before and 2 after (assuming that those lines actually exist).

 

data SiteVisitnew;

    Set SiteVisitnew nobs=_total_obs_;

    patternID = prxparse('/NOTIONAL/');

    if patternID then do j=max(1, _n_-2) to min(_total_obs_, _n_+2);

        set SiteVisitnew point=j;

        output;

   end;

   drop j;

run;

I hope I selected properly based on patternID, but that would be easy to fix if it's wrong.

Note that the same line might be selected twice, if NOTIONAL appears twice in close proximity.  There are ways to handle that, but you would have to define first what "handling that" actually means.

sonikm24
Calcite | Level 5

Hello @Astounding,

                                  I inserted the snippet of code that you mentioned in your above message to my sas code. I have attached the integrated code with the mail for your reference (and also a excel file to test). But when I am running the code, the full sec file is getting returned, and not the required code lines.

Please let me know if I am doing any mistake adding your part of code to my code (I have just commented the data SiteVisitnew part on my code and added your code)

Thanks.

Sonik Mandal

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1626 views
  • 0 likes
  • 4 in conversation