BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Shmuel
Garnet | Level 18

@ChrisNZ - there is a typo in your code:

        line:  

POS+find(STR,'SSA-',POS)+1;

should be:

POS=find(STR,'SSA-',POS)+1;

 otherwise the code skips existing hosts.

ChrisNZ
Tourmaline | Level 20
Can't test now but you make sense. 🙂 My code works as shown but would have failed for third match.
ChrisNZ
Tourmaline | Level 20

This works better:

data HAVE;
  STR='Uptime:-FrustratedSSA-COLUMB-MS-N5EF222:49:44SSA-GREENV-MS-N629222:49:47:BGP SSA-GREENV-MS-N629222:49';
run;

data WANT;
  set HAVE;
  keep HOST;
  POS=0;
  do while(find(STR,'SSA-',POS+1));
    POS=find(STR,'SSA-',POS+1);
    HOST=substr(STR,POS,19);
    output;
  end;
run;

proc print noobs; 
run;
    

 

 

 

Patrick
Opal | Level 21

@ChrisNZ As much as I like RegEx, they are overkill here.

 

I don't agree with you especially because there is an example in the docu which works almost unchanged for the problem at hand.

 

Your code relies on "SSA-" being sufficient to identify a hostname. May be that's true, may be not. If using a RegEx to begin with then one needs only to amend the RegEx and nothing further in case the pattern "SSA-" doesn't suffice.

 

ChrisNZ
Tourmaline | Level 20

@Patrick That's my point: in this case, the start and the length of the substring to extract are constant and known. 

 

Therefore, code that is shorter, much less resource-intensive, easier to read and maintain, and still does the job perfectly is better suited imho.

 

To each their own.. 🙂

Patrick
Opal | Level 21

@ChrisNZ

In regards of resource usage: Make sure that you explicitely define the length of variable host as else it will default to the length of source variable str which is 32KB - and that would create a big and unnecessary performance degradation.

  length host $19;
ChrisNZ
Tourmaline | Level 20

@Patrick Regardless, RegEx is always much more CPU-intensive than "standard" string functions.

 

data _null_; do I=1 to 1e8; A=index('ba','a'); end; run;

NOTE: DATA statement used (Total process time):
real time 1.37 seconds
cpu time 1.37 seconds


data _null_; do I=1 to 1e8; A=prxmatch('/a/o','ba'); end; run;

NOTE: DATA statement used (Total process time):
real time 16.79 seconds
cpu time 16.78 seconds

Patrick
Opal | Level 21

@ChrisNZ Agree in principal. Only when I've run the two codes without the explicit length statement and without compress turned on then your code version took up significantly more Real Time on my machine.

options fullstimer;
data HAVE;
  length str $32767;
  STR=repeat('Uptime:-FrustratedSSA-COLUMB-MS-N5EF222:49:44SSA-GREENV-MS-N629222:49:47:BGP SSA-GREENV-MS-N629222:49 X',324);
  do i=1 to 100;
    output;
  end;
run;

data WANT1;
  set HAVE;
/*  length host $19;*/
  keep HOST;
  POS=0;
  do while(find(STR,'SSA-',POS+1));
    POS=find(STR,'SSA-',POS+1);
    HOST=substr(STR,POS,19);
    output;
  end;
run;


data want2(keep=hos_t_name);
  set have;
  length hos_t_name $19;
  retain _t_prxid 0;
  if _n_=1 then _t_prxid=prxparse('/ssa-\w{6}-\w{2}-\w{5}/i');

  _t_start = 1;
  _t_stop = lengthn(STR);
  call prxnext(_t_prxid, _t_start, _t_stop, STR, _t_pos, _t_len);
    do while (_t_pos > 0);
       hos_t_name = substr(STR, _t_pos, _t_len);
       output;
       call prxnext(_t_prxid, _t_start, _t_stop, STR, _t_pos, _t_len);
    end;
run;
24         data WANT1;
25           set HAVE;
26         /*  length host $19;*/
27           keep HOST;
28           POS=0;
29           do while(find(STR,'SSA-',POS+1));
30             POS=find(STR,'SSA-',POS+1);
31             HOST=substr(STR,POS,19);
NOTE: Variable "HOST" was given a default length of 32767 as the result of a function call.  If you do not like this, please use a 
      LENGTH statement to declare "HOST".
32             output;
33           end;
34         run;

NOTE: There were 100 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT1 has 95400 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           3.78 seconds
      user cpu time       0.51 seconds
      system cpu time     0.93 seconds
      memory              1523.56k
      OS Memory           15096.00k
2                                                          The SAS System                            10:27 Saturday, January 7, 2017

      Timestamp           07/01/2017 10:30:30 AM
      Step Count                        11  Switch Count  72
      

35         
36         
37         data want2(keep=hos_t_name);
38           set have;
39           length hos_t_name $19;
40           retain _t_prxid 0;
41           if _n_=1 then _t_prxid=prxparse('/ssa-\w{6}-\w{2}-\w{5}/i');
42         
43           _t_start = 1;
44           _t_stop = lengthn(STR);
45           call prxnext(_t_prxid, _t_start, _t_stop, STR, _t_pos, _t_len);
46             do while (_t_pos > 0);
47                hos_t_name = substr(STR, _t_pos, _t_len);
48                output;
49                call prxnext(_t_prxid, _t_start, _t_stop, STR, _t_pos, _t_len);
50             end;
51         run;

NOTE: There were 100 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT2 has 95400 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           2.44 seconds
      user cpu time       2.23 seconds
      system cpu time     0.03 seconds
      memory              1526.00k
      OS Memory           14636.00k
      Timestamp           07/01/2017 10:30:32 AM
      Step Count                        12  Switch Count  80
ChrisNZ
Tourmaline | Level 20

True.

Setting the length reduces I/Os and avoiding RegEx saves CPU.

ballardw
Super User

@ChrisNZ wrote:

This works better:

 

 

 


Maybe the OP will actually tell/show what the output should actually look like. smileyfrustrated:

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 25 replies
  • 1378 views
  • 3 likes
  • 7 in conversation