BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Steve1964
Obsidian | Level 7

I use a segment of code from the paper 052-2009  written by Rick Langston in SAS Global Forum 2009 to creating text file from HTML. The codes read

filename in url 'https://www.zaobao.com/realtime/china/story20211010-1201937';
filename out 'd:\mydata\myfile.txt';
data _null_;
infile in lrecl=1 recfm=f end=eof;
file out lrecl=1 recfm=f;
input @1 x $char1.;
put @1 x $char1.;
if eof;
call symputx('filesize',_n_);
run;

I'm puzzled by then followings:

1)What's a data line of input HTML file? Do options lrecl=1 recfm=m in infile statement let input statement read atmost one character each time?

2)How Does input statement read a Chinese character into variable x, Since  the informt $char1. makes it read one character eacht time? 

3) I want create sas data set using the code 

filename in url 'https://www.zaobao.com/realtime/china/story20211010-1201937';
filename out 'd:\mydata\myfile.txt';
data  dst;
infile in lrecl=1 recfm=f end=eof;
file out lrecl=1 recfm=f;
input @1 x $char1.;
put @1 x $char1.;
if eof;
call symputx('filesize',_n_);
run;

The SAS dataset dst only has on observation with missing value. Why?

1 ACCEPTED SOLUTION
3 REPLIES 3
Kurt_Bremser
Super User

RECFM=F means records of a fixed length, any line-ending character or sequence (LF, CR, CRLF) is disregarded and in fact read as bytes.

LRECL=1 means to always read one character as one record. UTF characters will be read peacemeal, each of their constituent bytes separately.

Your code defines a boolean variable (eof) for the end of input data which will be set to true when the last byte is read. Since you use it in a Subsetting IF, only that byte will end up in the dataset. But your code will write all web data to the text file (because the INPUT/PUT happen before the subsetting IF).

Steve1964
Obsidian | Level 7

Thanks!

 Another question:

To my knowledge, input statement reads one character of the current dataline in current data-step loop,and read one character of the next dataline in the next loop without a double trailing @@. Is it right that input statement take each character as a seperate dataline since  any line-ending character or sequence (LF, CR, CRLF) is disregarded ?

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 4834 views
  • 0 likes
  • 2 in conversation