- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have to search for set of strings that when used in PRXMATCH comes to more than 262 characters long and SAS gives an unbalanced quotation marks warning.
NOTE: The quoted string currently being processed has become more than 262 characters long.
You might have unbalanced quotation marks.
The run finishes inspite of the warning and I am wondering if I should break the PRXMATCH into several statements. A sample code is given below. The code runs fine and I am wondering if I should ignore the warning.
data _null_;
var="24900|24901|24910|24911|24920|24921|24930|24931|24940|24941|24950|24951|24960|24961|24970|24971|24980|24981|24990|24991|25000|25001|25002|25003|25010|25011|25012|25013|25020|25021|25022|25023|25030|25031|25032|25033|25040|25041|25042|25043|25050|25051|25052|25053|25060|25061|25062|25063|25070|25071|25072|25073|25080|25081|25082|25083|25090|25091|25092|25093|36641|V5391|V5867";
varlength=Length(var);
%put var;
%put varlength;
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As is often the case, "there is an option for that".
Try
option noquotelenmax;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As is often the case, "there is an option for that".
Try
option noquotelenmax;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Besides of noquotelenmax you could also just create your macro variable differently - which should also make it easier to maintain your list of values.
data values;
infile datalines truncover;
input var:$16.;
datalines;
24900
24901
24910
24911
24920
24921
24930
;
run;
proc sql noprint;
select var into :var_list separated by '|'
from values;
quit;
%put &=var_list;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. That will not work for me as I have 50 such variables.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@PamG wrote:
Thanks. That will not work for me as I have 50 such variables.
@PamG The more you show and tell us what you have, the better we can provide advice.
Below sample code accepts as many lists (variables) as you have to then dynamically process these without the need for any further code change.
data work.var_val;
infile datalines truncover;
input name:$32. value:$16.;
datalines;
var1 24900
var1 24901
var1 24910
var1 24911
var1 24920
var1 24921
var1 24930
var2 25070
var2 25071
var2 25072
var2 25073
var2 25080
var2 25081
var2 25082
var2 25083
;
run;
proc sort data=work.var_val nodupkey;
by name value;
run;
data work.master;
infile datalines truncover;
length id source_str $40. match_flg_expected 3;
input id match_flg_expected source_str $40.;
datalines;
1 1 agvd98475249014374 25080 24930 25080
2 1 99999250729999
3 0 111112507911111
;
proc sql noprint;
select count(name), max(cnt_value) into :n_names trimmed, :n_values trimmed
from
(
select name, count(distinct value) as cnt_value
from work.var_val
group by name
)
;
quit;
/*data demo(keep=id search_var match_val pos len);*/
data work.demo(drop=_:);
if _n_=1 then
do;
_names_ind =1;
_values_ind=0;
do until(_done);
set work.var_val(keep=name value rename=(name=_name value=_value)) end=_done;
by _name;
array _values{&n_values} $16 _temporary_;
array _names {&n_names} $32 _temporary_;
array _prxid {&n_names} 8 _temporary_;
_values_ind+1;
_values[_values_ind]=_value;
if last._name then
do;
_names[_names_ind]=_name;
_prxid[_names_ind]=prxparse(cats('/(',catx(')|(',of _values[*]),')/'));
_names_ind+1;
_values_ind=0;
call missing(of _values[*]);
end;
end;
end;
set work.master;
length search_var $32 match_val $16;
_stop = length(source_str);
do _i=1 to &n_names;
_start = 1;
call prxnext(_prxid[_i], _start, _stop, trim(source_str), pos, len);
if pos>0 then search_var=_names[_i];
do while (pos > 0);
match_val = substr(trim(source_str), pos, len);
output;
call prxnext(_prxid[_i], _start, _stop, trim(source_str), pos, len);
end;
end;
/* below line only if also neew to output source rows with any match */
if missing(search_var) then output;
run;
proc print data=work.demo;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @PamG,
@PamG wrote:
var="24900|24901|24910|24911|24920|24921|24930|24931|24940|24941|24950|24951|24960|24961|24970|24971|24980|24981|24990|24991|25000|25001|25002|25003|25010|25011|25012|25013|25020|25021|25022|25023|25030|25031|25032|25033|25040|25041|25042|25043|25050|25051|25052|25053|25060|25061|25062|25063|25070|25071|25072|25073|25080|25081|25082|25083|25090|25091|25092|25093|36641|V5391|V5867";
Note that the length of this particular regular expression can be reduced from 377 to merely 44 characters:
249[0-9][01]|250[0-9][0-3]|36641|V5391|V5867
This makes it not only more readable, but also about seven times faster (on my workstation) when used with PRXMATCH. And, of course, it avoids the "... more than 262 characters ..." note in the log.