Hi,
I need to extract the variable names from the given macro variable &var (which is not working well using prxposn), then test if they are empty. If it is the case so I need to delete the separator (slash / in this case) and obtain an empty column in the output instead of that separator alone when the variables are empty and the concatenation result should be stored in the variable code.
The below approach is based on prx expressions since I don't have a control on the variable name that will be mentioned in the macro variable &var used as input (it happens that the variable name contains digits and it should be the ones before and after <>). The concatenation result should be stored in the variable code
%let var = [aeterm9] aedecod </@2> aeterm ; /*here aedecod and aeterm are two variables included in the dataset adae, but we could replace them by other ones*/
data want;
set adae ;
patternID1=prxparse("/(\[(\w+)\])?\s*(\w*)/");
/* define a pattern for <Separator1DigitsSeparator2> Variable2 */
patternID2=prxparse("/<([^<>0-9]*)(\d*)([^<>0-9]*)>\s*(\w*)/");
/* define a pattern for matched braces ')' or '(' */
patternID3=prxparse("/<[^a-zA-Z\<\>]*(\(|\))[^a=zA-Z\<\>]*\>/");
do i=1 to 2;
test1= scan("[aeterm9] aedecod </@2> aeterm" , i , '<>' ) ;
test2=prxposn(patternID1,3,test1);
end;
varlist= test2 ;
if test2ne '' and test2 ne '.' then do;
code=cats('strip(put(',test2,',',vformatx(test2),'))');
label=vlabelx(test2);
end;
else do;
code='';
label='';
end;
run;
Hope I have clarified enough the problem and thanks in advance for your help.
Please answer @SASJedi very valid questions because if your source string has a "stable" pattern then things will become much easier and no RegEx will be required.
If things aren't that simple then which RegEx will work and how complex it needs to be to identify a SAS variable name without also picking false positives will very much depend on what patterns you have to cover.
With SAS data step syntax the following would be variables.
first.varname function(varname) array_name[varname] etc.
The RegEx for a string that complies with SAS naming conventions for a variable could look like:
[_[:alpha:]]\w{0,31}
- one to 32 characters
- First character is a letter or underscore
- 2nd to 32th character is a underscore, letter or digit
Here some RegEx that returns the sub-strings you defined as desired.
data test;
source_string="[aeterm9] aedecod </@2> aeterm";
length found $32;
_prxid=prxparse('/(^|[ ])([_[:alpha:]]\w{0,31})($|[ ])/i');
_start = 1;
_stop = length(source_string);
call prxnext(_prxid, _start, _stop, trim(source_string), _pos, _len);
do while (_pos > 0);
found = prxposn(_prxid, 2, source_string);
output;
call prxnext(_prxid, _start, _stop, trim(source_string), _pos, _len);
end;
run;
proc print data=test;
run;
Maybe try BasePlus package and the %getVars() macro.
Example 3:
%put *%getVars(sashelp.class, pattern=i|a)*;
%put *%getVars(sashelp.class, pattern=^w)*;
%put *%getVars(sashelp.class, pattern=ght$)*;
or Example 4:
%put *%getVars(sashelp.class, sep=+, pattern=^(w|h)|x$, varRange=_numeric_)*;
Bart
The golden rule says: "if you can solve it with or without regular expressions, solve it without":
%let var = [aeterm9] aedecod </@2> aeterm ;
data _null_;
str=symget('var');
length v $ 32;
do i = 1 to countw(str, " ");
v = compress(scan(str,i, " "),"_","KAD");
call symputX(cats("V_",i),v, "G");
end;
run;
%put &=V_1. &=V_2. &=V_3. &=V_4.;
Bart
Hi @yabwon I appreciate you way you simplify things! It works well when it's outside a macro structure, so I would prefer an approach based on prx functions to synchronize with my existing structure, Thanks! 🙂
Making it a macro is almost trivial:
%let var = [aeterm9] aedecod </@2> aeterm ;
%macro cutIntoParts(varName);
%local i;
%do i = 1 %to %sysfunc(countw(%superq(&varName.), %str( )));
%global V_&i.;
%let V_&i. = %sysfunc(compress(%scan(%superq(&varName.), &i., %str( )),_,KAD));
%end;
%mend cutIntoParts;
%cutIntoParts(var)
%put &=V_1. &=V_2. &=V_3. &=V_4.;
Bart
1- Yes the goal is to extract the variable name from [some text] varname1 <@n> varname2 under the hypothesis that we could have multiple variable names with some diversity in the characters composing those names.
2- The expected from test1 and test2 is to contain the values + separator : / as an example : aeterm1/aedecod1 or /aeterm2/aedecod2/ and the goal is to exclude the separator when the variables are empty and avoid the single separator from displaying in the report in that case
3- The dataset could adae or another one and we need to concatenate two or more variables with the separator (any symbol that will be inserted between variable values: </> <#> <$> <-> <> <(> <)> <!> <|> )
Please answer @SASJedi very valid questions because if your source string has a "stable" pattern then things will become much easier and no RegEx will be required.
If things aren't that simple then which RegEx will work and how complex it needs to be to identify a SAS variable name without also picking false positives will very much depend on what patterns you have to cover.
With SAS data step syntax the following would be variables.
first.varname function(varname) array_name[varname] etc.
The RegEx for a string that complies with SAS naming conventions for a variable could look like:
[_[:alpha:]]\w{0,31}
- one to 32 characters
- First character is a letter or underscore
- 2nd to 32th character is a underscore, letter or digit
Here some RegEx that returns the sub-strings you defined as desired.
data test;
source_string="[aeterm9] aedecod </@2> aeterm";
length found $32;
_prxid=prxparse('/(^|[ ])([_[:alpha:]]\w{0,31})($|[ ])/i');
_start = 1;
_stop = length(source_string);
call prxnext(_prxid, _start, _stop, trim(source_string), _pos, _len);
do while (_pos > 0);
found = prxposn(_prxid, 2, source_string);
output;
call prxnext(_prxid, _start, _stop, trim(source_string), _pos, _len);
end;
run;
proc print data=test;
run;
@Patrick it seems like your approach is matching exactly what I need, then I will be able to build on it and exclude the separator from the concatenation when one or multiple variables are empty. Thanks!
@hamza_saspg wrote:
Hi,
I need to extract the variable names from the given macro variable &var (which is not working well using prxposn), then test if they are empty.
What does "test if they are empty" mean? I don't see any code actually testing values of variables.
SAS has MISSING, and can test for that.
I would say that instead of bothering with such a macro use tools SAS has already provided such as Proc Freq with the NLEVELS option.
ods select nlevels; ods output nlevels=myleveldataset; /* if you want a data set*/ proc freq data=yourdataset nlevels; run;
For example running that on SASHELP.CLASS will show this content in the level dataset. If the NNonMissLevels is not equal to zero then that variable has at least one non-missing value.
Nmiss of zero means that variable has no missing values at all.
NNon Table NMiss Miss Var NLevels Levels Levels Name 19 0 19 Sex 2 0 2 Age 6 0 6 Height 17 0 17 Weight 15 0 15
The Nmiss levels allows for the use of the special missing values so there is a potential of values up to 27: . .A through .Z and ._
@ballardw I see your point but the expected is to concatenate two empty variables and be able to exclude the separator in case one or multiple variables are empty, so I need an approach considering an existing structure based on prx functions and macros
@hamza_saspg wrote:
@ballardw I see your point but the expected is to concatenate two empty variables and be able to exclude the separator in case one or multiple variables are empty, so I need an approach considering an existing structure based on prx functions and macros
Perhaps some examples of what you have and what you expect for output.
The CATX function will insert a string as you describe if I understand the requirement.
data example; var1 = ''; var2 = 'sometext'; var3 = ''; var4 = 'something else'; out1 = catx('|',var1,var2); out2 = catx('|',var1,var3,var2); out3 = catx('|',var1,var3,var2,var4); out4 = catx('|',var1,var1,var3); out5 = catx('|',var2,var1,var3,var4); run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.