- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I need to extract the variable names from the given macro variable &var (which is not working well using prxposn), then test if they are empty. If it is the case so I need to delete the separator (slash / in this case) and obtain an empty column in the output instead of that separator alone when the variables are empty and the concatenation result should be stored in the variable code.
The below approach is based on prx expressions since I don't have a control on the variable name that will be mentioned in the macro variable &var used as input (it happens that the variable name contains digits and it should be the ones before and after <>). The concatenation result should be stored in the variable code
%let var = [aeterm9] aedecod </@2> aeterm ; /*here aedecod and aeterm are two variables included in the dataset adae, but we could replace them by other ones*/
data want;
set adae ;
patternID1=prxparse("/(\[(\w+)\])?\s*(\w*)/");
/* define a pattern for <Separator1DigitsSeparator2> Variable2 */
patternID2=prxparse("/<([^<>0-9]*)(\d*)([^<>0-9]*)>\s*(\w*)/");
/* define a pattern for matched braces ')' or '(' */
patternID3=prxparse("/<[^a-zA-Z\<\>]*(\(|\))[^a=zA-Z\<\>]*\>/");
do i=1 to 2;
test1= scan("[aeterm9] aedecod </@2> aeterm" , i , '<>' ) ;
test2=prxposn(patternID1,3,test1);
end;
varlist= test2 ;
if test2ne '' and test2 ne '.' then do;
code=cats('strip(put(',test2,',',vformatx(test2),'))');
label=vlabelx(test2);
end;
else do;
code='';
label='';
end;
run;
Hope I have clarified enough the problem and thanks in advance for your help.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please answer @SASJedi very valid questions because if your source string has a "stable" pattern then things will become much easier and no RegEx will be required.
If things aren't that simple then which RegEx will work and how complex it needs to be to identify a SAS variable name without also picking false positives will very much depend on what patterns you have to cover.
With SAS data step syntax the following would be variables.
first.varname function(varname) array_name[varname] etc.
The RegEx for a string that complies with SAS naming conventions for a variable could look like:
[_[:alpha:]]\w{0,31}
- one to 32 characters
- First character is a letter or underscore
- 2nd to 32th character is a underscore, letter or digit
Here some RegEx that returns the sub-strings you defined as desired.
data test;
source_string="[aeterm9] aedecod </@2> aeterm";
length found $32;
_prxid=prxparse('/(^|[ ])([_[:alpha:]]\w{0,31})($|[ ])/i');
_start = 1;
_stop = length(source_string);
call prxnext(_prxid, _start, _stop, trim(source_string), _pos, _len);
do while (_pos > 0);
found = prxposn(_prxid, 2, source_string);
output;
call prxnext(_prxid, _start, _stop, trim(source_string), _pos, _len);
end;
run;
proc print data=test;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Maybe try BasePlus package and the %getVars() macro.
Example 3:
%put *%getVars(sashelp.class, pattern=i|a)*;
%put *%getVars(sashelp.class, pattern=^w)*;
%put *%getVars(sashelp.class, pattern=ght$)*;
or Example 4:
%put *%getVars(sashelp.class, sep=+, pattern=^(w|h)|x$, varRange=_numeric_)*;
Bart
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug
"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings
SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The golden rule says: "if you can solve it with or without regular expressions, solve it without":
%let var = [aeterm9] aedecod </@2> aeterm ;
data _null_;
str=symget('var');
length v $ 32;
do i = 1 to countw(str, " ");
v = compress(scan(str,i, " "),"_","KAD");
call symputX(cats("V_",i),v, "G");
end;
run;
%put &=V_1. &=V_2. &=V_3. &=V_4.;
Bart
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug
"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings
SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @yabwon I appreciate you way you simplify things! It works well when it's outside a macro structure, so I would prefer an approach based on prx functions to synchronize with my existing structure, Thanks! 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Making it a macro is almost trivial:
%let var = [aeterm9] aedecod </@2> aeterm ;
%macro cutIntoParts(varName);
%local i;
%do i = 1 %to %sysfunc(countw(%superq(&varName.), %str( )));
%global V_&i.;
%let V_&i. = %sysfunc(compress(%scan(%superq(&varName.), &i., %str( )),_,KAD));
%end;
%mend cutIntoParts;
%cutIntoParts(var)
%put &=V_1. &=V_2. &=V_3. &=V_4.;
Bart
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug
"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings
SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Is the text in the macro variable consistently laid out like this, and you just want to extract varname1 and varname2? Or does the layout vary?
[some text] varname1 <@n> varname2 - You provided a sample input value of "[aeterm9] aedecod </@2> aeterm". What would the values of test1 and test2 be if your code was working the way you wanted it to work?
- Please provide a simple sample of the dataset adae. Your use of the functions VFORMATX and VLABLEX doesn't make without this for context.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
1- Yes the goal is to extract the variable name from [some text] varname1 <@n> varname2 under the hypothesis that we could have multiple variable names with some diversity in the characters composing those names.
2- The expected from test1 and test2 is to contain the values + separator : / as an example : aeterm1/aedecod1 or /aeterm2/aedecod2/ and the goal is to exclude the separator when the variables are empty and avoid the single separator from displaying in the report in that case
3- The dataset could adae or another one and we need to concatenate two or more variables with the separator (any symbol that will be inserted between variable values: </> <#> <$> <-> <> <(> <)> <!> <|> )
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please answer @SASJedi very valid questions because if your source string has a "stable" pattern then things will become much easier and no RegEx will be required.
If things aren't that simple then which RegEx will work and how complex it needs to be to identify a SAS variable name without also picking false positives will very much depend on what patterns you have to cover.
With SAS data step syntax the following would be variables.
first.varname function(varname) array_name[varname] etc.
The RegEx for a string that complies with SAS naming conventions for a variable could look like:
[_[:alpha:]]\w{0,31}
- one to 32 characters
- First character is a letter or underscore
- 2nd to 32th character is a underscore, letter or digit
Here some RegEx that returns the sub-strings you defined as desired.
data test;
source_string="[aeterm9] aedecod </@2> aeterm";
length found $32;
_prxid=prxparse('/(^|[ ])([_[:alpha:]]\w{0,31})($|[ ])/i');
_start = 1;
_stop = length(source_string);
call prxnext(_prxid, _start, _stop, trim(source_string), _pos, _len);
do while (_pos > 0);
found = prxposn(_prxid, 2, source_string);
output;
call prxnext(_prxid, _start, _stop, trim(source_string), _pos, _len);
end;
run;
proc print data=test;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Patrick it seems like your approach is matching exactly what I need, then I will be able to build on it and exclude the separator from the concatenation when one or multiple variables are empty. Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@hamza_saspg wrote:
Hi,
I need to extract the variable names from the given macro variable &var (which is not working well using prxposn), then test if they are empty.
What does "test if they are empty" mean? I don't see any code actually testing values of variables.
SAS has MISSING, and can test for that.
I would say that instead of bothering with such a macro use tools SAS has already provided such as Proc Freq with the NLEVELS option.
ods select nlevels; ods output nlevels=myleveldataset; /* if you want a data set*/ proc freq data=yourdataset nlevels; run;
For example running that on SASHELP.CLASS will show this content in the level dataset. If the NNonMissLevels is not equal to zero then that variable has at least one non-missing value.
Nmiss of zero means that variable has no missing values at all.
NNon Table NMiss Miss Var NLevels Levels Levels Name 19 0 19 Sex 2 0 2 Age 6 0 6 Height 17 0 17 Weight 15 0 15
The Nmiss levels allows for the use of the special missing values so there is a potential of values up to 27: . .A through .Z and ._
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@ballardw I see your point but the expected is to concatenate two empty variables and be able to exclude the separator in case one or multiple variables are empty, so I need an approach considering an existing structure based on prx functions and macros
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@hamza_saspg wrote:
@ballardw I see your point but the expected is to concatenate two empty variables and be able to exclude the separator in case one or multiple variables are empty, so I need an approach considering an existing structure based on prx functions and macros
Perhaps some examples of what you have and what you expect for output.
The CATX function will insert a string as you describe if I understand the requirement.
data example; var1 = ''; var2 = 'sometext'; var3 = ''; var4 = 'something else'; out1 = catx('|',var1,var2); out2 = catx('|',var1,var3,var2); out3 = catx('|',var1,var3,var2,var4); out4 = catx('|',var1,var1,var3); out5 = catx('|',var2,var1,var3,var4); run;