BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Steve1964
Obsidian | Level 7

I use a segment of code from the paper 052-2009  written by Rick Langston in SAS Global Forum 2009 to creating text file from HTML. The codes read

filename in url 'https://www.zaobao.com/realtime/china/story20211010-1201937';
filename out 'd:\mydata\myfile.txt';
data _null_;
infile in lrecl=1 recfm=f end=eof;
file out lrecl=1 recfm=f;
input @1 x $char1.;
put @1 x $char1.;
if eof;
call symputx('filesize',_n_);
run;

I'm puzzled by then followings:

1)What's a data line of input HTML file? Do options lrecl=1 recfm=m in infile statement let input statement read atmost one character each time?

2)How Does input statement read a Chinese character into variable x, Since  the informt $char1. makes it read one character eacht time? 

3) I want create sas data set using the code 

filename in url 'https://www.zaobao.com/realtime/china/story20211010-1201937';
filename out 'd:\mydata\myfile.txt';
data  dst;
infile in lrecl=1 recfm=f end=eof;
file out lrecl=1 recfm=f;
input @1 x $char1.;
put @1 x $char1.;
if eof;
call symputx('filesize',_n_);
run;

The SAS dataset dst only has on observation with missing value. Why?

1 ACCEPTED SOLUTION
3 REPLIES 3
Kurt_Bremser
Super User

RECFM=F means records of a fixed length, any line-ending character or sequence (LF, CR, CRLF) is disregarded and in fact read as bytes.

LRECL=1 means to always read one character as one record. UTF characters will be read peacemeal, each of their constituent bytes separately.

Your code defines a boolean variable (eof) for the end of input data which will be set to true when the last byte is read. Since you use it in a Subsetting IF, only that byte will end up in the dataset. But your code will write all web data to the text file (because the INPUT/PUT happen before the subsetting IF).

Steve1964
Obsidian | Level 7

Thanks!

 Another question:

To my knowledge, input statement reads one character of the current dataline in current data-step loop,and read one character of the next dataline in the next loop without a double trailing @@. Is it right that input statement take each character as a seperate dataline since  any line-ending character or sequence (LF, CR, CRLF) is disregarded ?

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 6527 views
  • 0 likes
  • 2 in conversation