@ChrisNZ - there is a typo in your code:
line:
POS+find(STR,'SSA-',POS)+1;
should be:
POS=find(STR,'SSA-',POS)+1;
otherwise the code skips existing hosts.
This works better:
data HAVE;
STR='Uptime:-FrustratedSSA-COLUMB-MS-N5EF222:49:44SSA-GREENV-MS-N629222:49:47:BGP SSA-GREENV-MS-N629222:49';
run;
data WANT;
set HAVE;
keep HOST;
POS=0;
do while(find(STR,'SSA-',POS+1));
POS=find(STR,'SSA-',POS+1);
HOST=substr(STR,POS,19);
output;
end;
run;
proc print noobs;
run;
@ChrisNZ As much as I like RegEx, they are overkill here.
I don't agree with you especially because there is an example in the docu which works almost unchanged for the problem at hand.
Your code relies on "SSA-" being sufficient to identify a hostname. May be that's true, may be not. If using a RegEx to begin with then one needs only to amend the RegEx and nothing further in case the pattern "SSA-" doesn't suffice.
@Patrick That's my point: in this case, the start and the length of the substring to extract are constant and known.
Therefore, code that is shorter, much less resource-intensive, easier to read and maintain, and still does the job perfectly is better suited imho.
To each their own.. 🙂
In regards of resource usage: Make sure that you explicitely define the length of variable host as else it will default to the length of source variable str which is 32KB - and that would create a big and unnecessary performance degradation.
length host $19;
@Patrick Regardless, RegEx is always much more CPU-intensive than "standard" string functions.
data _null_; do I=1 to 1e8; A=index('ba','a'); end; run;
NOTE: DATA statement used (Total process time):
real time 1.37 seconds
cpu time 1.37 seconds
data _null_; do I=1 to 1e8; A=prxmatch('/a/o','ba'); end; run;
NOTE: DATA statement used (Total process time):
real time 16.79 seconds
cpu time 16.78 seconds
@ChrisNZ Agree in principal. Only when I've run the two codes without the explicit length statement and without compress turned on then your code version took up significantly more Real Time on my machine.
options fullstimer;
data HAVE;
length str $32767;
STR=repeat('Uptime:-FrustratedSSA-COLUMB-MS-N5EF222:49:44SSA-GREENV-MS-N629222:49:47:BGP SSA-GREENV-MS-N629222:49 X',324);
do i=1 to 100;
output;
end;
run;
data WANT1;
set HAVE;
/* length host $19;*/
keep HOST;
POS=0;
do while(find(STR,'SSA-',POS+1));
POS=find(STR,'SSA-',POS+1);
HOST=substr(STR,POS,19);
output;
end;
run;
data want2(keep=hos_t_name);
set have;
length hos_t_name $19;
retain _t_prxid 0;
if _n_=1 then _t_prxid=prxparse('/ssa-\w{6}-\w{2}-\w{5}/i');
_t_start = 1;
_t_stop = lengthn(STR);
call prxnext(_t_prxid, _t_start, _t_stop, STR, _t_pos, _t_len);
do while (_t_pos > 0);
hos_t_name = substr(STR, _t_pos, _t_len);
output;
call prxnext(_t_prxid, _t_start, _t_stop, STR, _t_pos, _t_len);
end;
run;
24 data WANT1; 25 set HAVE; 26 /* length host $19;*/ 27 keep HOST; 28 POS=0; 29 do while(find(STR,'SSA-',POS+1)); 30 POS=find(STR,'SSA-',POS+1); 31 HOST=substr(STR,POS,19); NOTE: Variable "HOST" was given a default length of 32767 as the result of a function call. If you do not like this, please use a LENGTH statement to declare "HOST". 32 output; 33 end; 34 run; NOTE: There were 100 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT1 has 95400 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 3.78 seconds user cpu time 0.51 seconds system cpu time 0.93 seconds memory 1523.56k OS Memory 15096.00k 2 The SAS System 10:27 Saturday, January 7, 2017 Timestamp 07/01/2017 10:30:30 AM Step Count 11 Switch Count 72 35 36 37 data want2(keep=hos_t_name); 38 set have; 39 length hos_t_name $19; 40 retain _t_prxid 0; 41 if _n_=1 then _t_prxid=prxparse('/ssa-\w{6}-\w{2}-\w{5}/i'); 42 43 _t_start = 1; 44 _t_stop = lengthn(STR); 45 call prxnext(_t_prxid, _t_start, _t_stop, STR, _t_pos, _t_len); 46 do while (_t_pos > 0); 47 hos_t_name = substr(STR, _t_pos, _t_len); 48 output; 49 call prxnext(_t_prxid, _t_start, _t_stop, STR, _t_pos, _t_len); 50 end; 51 run; NOTE: There were 100 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT2 has 95400 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 2.44 seconds user cpu time 2.23 seconds system cpu time 0.03 seconds memory 1526.00k OS Memory 14636.00k Timestamp 07/01/2017 10:30:32 AM Step Count 12 Switch Count 80
True.
Setting the length reduces I/Os and avoiding RegEx saves CPU.
@ChrisNZ wrote:
This works better:
Maybe the OP will actually tell/show what the output should actually look like. smileyfrustrated:
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.