Hello Team ,
am trying to read the below Out_test.txt file and trying to read the highlighted text in the file with below code
Out_test.txt :
<?xml version="1.0" encoding="utf-8"?> <entry xml:base="http://Test.sas.api" xmlns="http://www.w3.org/2005/Atom" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns:georss="http://www.georss.org/georss" xmlns:gml="http://www.opengis.net/gml"> <id>//Test.sas.api('/iapps/SAR/files/163... new test file €.log')</id> <category term="SP.File" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" /> <link rel="edit" href="Web/GetFileByServerRelativeUrl('/iapps/SAR/files/mynewfile%20%20new%20test%20file%20%E2%82%AC.log')" /> <link rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/Author" type="application/atom+xml;type=entry" title="Author" href="Web/GetFileByServerRelativeUrl('/iapps/SAR/files/1632849929_mynewfile%20%20new%20test%20file%20%E2%82%AC.log')/Auth <link rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/CheckedOutByUser" type="application/atom+xml;type=entry" title="CheckedOutByUser" href="Web/GetFileByServerRelativeUrl('/iapps/SAR/files/mynewfile%20%20new%20test%20file%20%E2%82%AC.l <title /> <updated>2021-09-28T17:32:50Z</updated> <author> <name /> </author> <content type="application/xml"> <m:properties> <d:CheckInComment></d:CheckInComment> <d:CheckOutType >2</d:CheckOutType> <d:UIVersionLabel>1.0</d:UIVersionLabel> </m:properties> </content> </entry>
code :
data Test_out_href;
length text $32767;
infile "/home/out_test.txt";
input;
text=scan(substr(_infile_, index(_infile_,"href")+37),1,"')");
call symputx("ServerRelativeUrl_new", text);
run;
%put this is to test &ServerRelativeUrl_new.
Log :
NOTE: 1 record was read from the infile "/home/out_test.txt".
The minimum record length was 3511.
The maximum record length was 3511.
NOTE: The data set WORK.TEST_OUT_HREF has 1 observations and 1 variables.
NOTE: Compressing data set WORK.TEST_OUT_HREF increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 1071.40k
OS Memory 24492.00k
Timestamp 09/28/2021 10:22:44 PM
Step Count 55 Switch Count 2
Page Faults 0
Page Reclaims 165
Page Swaps 0
2 The SAS System Tuesday, September 28, 2021 07:05:00 PM
Voluntary Context Switches 23
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 528
33 %put this is to test &ServerRelativeUrl_new.
34
35
36 GOPTIONS NOACCESSIBLE;
WARNING: Apparent invocation of macro E2 not resolved.
WARNING: Apparent invocation of macro AC not resolved.
this is to test /iapps/SAR/files/mynewfile%20%20new%20test%20file%20%E2%82%AC.log.log
extra text ".log" is appended to the macro variable , Can i know reason behind it .
in my code , am trying to search the first occurance of href in the file and get the text after 37 characters till ) parenthesis.
text=scan(substr(_infile_, index(_infile_,"href")+37),1,"')");
I think the extra extension is a bug in the error message.
But you need to add some macro quoting because of the strange text you are including in the value. Otherwise things like %AC are going to look like a macro call to the macro processor.
%let ServerRelativeUrl_new=%superq(ServerRelativeUrl_new);
Example:
188 data _null_; 189 text='/iapps/SAR/files/Mytestfile_%20%20new%20test%20file%20%E2%82%AC.txt'; 190 call symputx("ServerRelativeUrl_new",text); 191 run; NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 192 WARNING: Apparent invocation of macro E2 not resolved. 193 %put &=ServerRelativeUrl_new; WARNING: Apparent invocation of macro AC not resolved. SERVERRELATIVEURL_NEW=/iapps/SAR/files/Mytestfile_%20%20new%20test%20file%20%E2%82%AC.txt.txt 194 %let ServerRelativeUrl_new=%superq(ServerRelativeUrl_new); 195 %put &=ServerRelativeUrl_new; SERVERRELATIVEURL_NEW=/iapps/SAR/files/Mytestfile_%20%20new%20test%20file%20%E2%82%AC.txt
Do you need to convert those %20 things back into the characters they mean?
196 data _null_; 197 text='/iapps/SAR/files/Mytestfile_%20%20new%20test%20file%20%E2%82%AC.txt'; 198 call symputx("ServerRelativeUrl_new",urldecode(text)); 199 run; NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds 200 201 %put &=ServerRelativeUrl_new; SERVERRELATIVEURL_NEW=/iapps/SAR/files/Mytestfile_ new test file €.txt 202 %let ServerRelativeUrl_new=%superq(ServerRelativeUrl_new); 203 %put &=ServerRelativeUrl_new; SERVERRELATIVEURL_NEW=/iapps/SAR/files/Mytestfile_ new test file €.txt
Are you perhaps running using UTF-8?
Hint: if you want us to play with reading text please post the text into a TEXT box opened on the forum with the the </> icon that appears above the message window. The main message windows on this forum reformat text when pasted such as inserting html tags and removing spaces. End of line characters may also get messed with.
So there is almost no chance that what you posted in your question is actually the same as your source text file.
First thing, I would make sure that HREF is actually in the string before using it for anything. Also, If the input file gets treated as having more than one line you would be overwriting your macro variable with a bad value for TEXT as you call symputx unconditionally and would be replaced with each line of input leaving the value as whatever came from the last line of data when href is on the line. You do have href appear more than once in that information.
I would trouble shoot this to some extent by reading the data in one step assigning the value of _infile_ to a variable. Then use another data step to parse and extract the desired text. Then I can see what the value of _infile_ actually is to diagnose issues.
I also suggest finding some way to identify the line you want so that the search for "href" only happens where you actually want it to.
Why is there no semicolon after the %put statement?
What's in the data set variable?
Also this is more robust:
TEXT=scan( substr(_infile_, index(_infile_,"href")), 2, "(')" );
Thanks , this works similar way
Code : data Test_out; length text2 $32767; infile "/home/out_test.txt"; input; TEXT2=scan( substr(_infile_, index(_infile_,"href")), 2, "(')" ); put 'this is to test inside data step ' TEXT2; call symputx("ServerRelativeUrl_new", strip(TEXT2)); run; %put this is to test out side &ServerRelativeUrl_new. ; Log : this is to test inside data step /iapps/SAR/files/Mytestfile_%20%20new%20test%20file%20%E2%82%AC.txt NOTE: 1 record was read from the infile "/home/b723166/out_test.txt". The minimum record length was 3511. The maximum record length was 3511. NOTE: The data set WORK.TEST_OUT has 1 observations and 1 variables. NOTE: Compressing data set WORK.TEST_OUT increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 1092.56k OS Memory 21164.00k Timestamp 09/29/2021 11:00:56 AM Step Count 11 Switch Count 2 2 The SAS System Wednesday, September 29, 2021 10:48:00 AM Page Faults 0 Page Reclaims 181 Page Swaps 0 Voluntary Context Switches 18 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 528 35 36 %put this is to test out side &ServerRelativeUrl_new. 37 38 GOPTIONS NOACCESSIBLE; WARNING: Apparent invocation of macro E2 not resolved. WARNING: Apparent invocation of macro AC not resolved. this is to test out side /iapps/SAR/files/Mytestfile_%20%20new%20test%20file%20%E2%82%AC.txt.txt
The variable inside the data step works fine , but when we are using the variable outside of datastep it added with extra text like below /iapps/SAR/files/Mytestfile_%20%20new%20test%20file%20%E2%82%AC.txt.txt
please suggest some hints to fix this.
This also results similar way data Test_out; %let TEXT2='/iapps/SAR/files/mytestfile_%20%20new%20test%20file%20%E2%82%AC.txt'; put 'this is to test inside data step ' &TEXT2; call symputx("ServerRelativeUrl_new2", strip(&TEXT2)); run; %put this is to test out side &ServerRelativeUrl_new2. log : NOTE: Compression was disabled for data set WORK.TEST_OUT because compression overhead would increase the size of the data set. this is to test inside data step /iapps/SAR/files/mytestfile_%20%20new%20test%20file%20%E2%82%AC.txt NOTE: The data set WORK.TEST_OUT has 1 observations and 0 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 394.18k OS Memory 20388.00k Timestamp 09/29/2021 11:18:11 AM Step Count 16 Switch Count 2 Page Faults 0 Page Reclaims 55 Page Swaps 0 Voluntary Context Switches 15 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 136 31 32 %put this is to test out side &ServerRelativeUrl_new2. 33 34 GOPTIONS NOACCESSIBLE; WARNING: Apparent invocation of macro E2 not resolved. WARNING: Apparent invocation of macro AC not resolved. 2 The SAS System Wednesday, September 29, 2021 10:48:00 AM this is to test out side /iapps/SAR/files/mytestfile_%20%20new%20test%20file%20%E2%82%AC.txt.txt
Here's a suggestion to test.
%put _user_; %put text is &ServerRelativeUrl_new. ; %put;
The _user_ says show all the user defined macro variables. Hopefully you haven't generated many in the current session. Note that the _user_ line does not generate the warning about the macro invocation and does not show the extra .log.
Then the second put does have the warning and shows the extra .log. Think there might be a connection?
I think you are running into one of the macro resolution rules involving the dot character. Run this and you can see that the . immediately following an "apparent invocation of macro" duplicates text.
data _null_; x='text %be.abc'; call symputx('xtest',x); y='text %be abc'; call symputx('ytest',y); run; %put Xtest is: &xtest. Ytest is: &ytest. ;
I don't have a suggestion that will work right now if you really want the %E2 and %AC characters since this is happening a macro resolution time, not at assignment or creation.
I think the extra extension is a bug in the error message.
But you need to add some macro quoting because of the strange text you are including in the value. Otherwise things like %AC are going to look like a macro call to the macro processor.
%let ServerRelativeUrl_new=%superq(ServerRelativeUrl_new);
Example:
188 data _null_; 189 text='/iapps/SAR/files/Mytestfile_%20%20new%20test%20file%20%E2%82%AC.txt'; 190 call symputx("ServerRelativeUrl_new",text); 191 run; NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 192 WARNING: Apparent invocation of macro E2 not resolved. 193 %put &=ServerRelativeUrl_new; WARNING: Apparent invocation of macro AC not resolved. SERVERRELATIVEURL_NEW=/iapps/SAR/files/Mytestfile_%20%20new%20test%20file%20%E2%82%AC.txt.txt 194 %let ServerRelativeUrl_new=%superq(ServerRelativeUrl_new); 195 %put &=ServerRelativeUrl_new; SERVERRELATIVEURL_NEW=/iapps/SAR/files/Mytestfile_%20%20new%20test%20file%20%E2%82%AC.txt
Do you need to convert those %20 things back into the characters they mean?
196 data _null_; 197 text='/iapps/SAR/files/Mytestfile_%20%20new%20test%20file%20%E2%82%AC.txt'; 198 call symputx("ServerRelativeUrl_new",urldecode(text)); 199 run; NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds 200 201 %put &=ServerRelativeUrl_new; SERVERRELATIVEURL_NEW=/iapps/SAR/files/Mytestfile_ new test file €.txt 202 %let ServerRelativeUrl_new=%superq(ServerRelativeUrl_new); 203 %put &=ServerRelativeUrl_new; SERVERRELATIVEURL_NEW=/iapps/SAR/files/Mytestfile_ new test file €.txt
Are you perhaps running using UTF-8?
Thanks every one for valuable inputs , with %superq the value works as expected
data _null_; text='/Eapps/SAR/files/Mytestfile_%20%20new%20test%20file%20%E2%82%AC.txt'; call symputx("ServerRelativeUrl_new",text); run; %put &ServerRelativeUrl_new and the other value is %superq(ServerRelativeUrl_new);
Welcome to the wonderful world of blanks and special characters in filenames.
(sarcasm intended)
This causes the XML to use %xx sequences to encode the special characters (hex E282AC is the Euro symbol in UTF-8), which in turn causes a hickup by the SAS macro processor. The blanks (encoded as %20) do not cause a problem because 20 can't be a macro name.
Note that this only happens in the %PUT, the macro variable is correct:
data Test_out_href;
length text $32767;
input;
text=scan(substr(_infile_, index(_infile_,"href")+37),1,"')");
call symputx("ServerRelativeUrl_new", text);
datalines;
<link rel="edit" href="Web/GetFileByServerRelativeUrl('/iapps/SAR/files/mynewfile%20%20new%20test%20file%20%E2%82%AC.log')" />
;
%put this is to test &ServerRelativeUrl_new.;
data check;
text = symget("ServerRelativeUrl_new");
run;
"log" is not duplicated when the contents of the macro variable are retrieved with SYMGET and the macro processor itself does not need to intervene.
Thanks Kurt , Agree that this issue is caused by special characters like '%' , in the variable. SYMGET helps in some situations but we cannot use functions like SYMGET when we are passing as a input variable to macro's. we like to use the string with special characters in a variable in several places of code like
for example we can’t use SYMGET to pass to macro’s like below , which inturn results an error.
%Test_macro (url=%sysfunc(symget(ServerRelativeUrl_new)) );
%put is just to test the value of the variable , but we need to use the URL variable in different places . is there any way to deal with this issue.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.