BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Steve1964
Obsidian | Level 7

I use a segment of code from the paper 052-2009  written by Rick Langston in SAS Global Forum 2009 to creating text file from HTML. The codes read

filename in url 'https://www.zaobao.com/realtime/china/story20211010-1201937';
filename out 'd:\mydata\myfile.txt';
data _null_;
infile in lrecl=1 recfm=f end=eof;
file out lrecl=1 recfm=f;
input @1 x $char1.;
put @1 x $char1.;
if eof;
call symputx('filesize',_n_);
run;

I'm puzzled by then followings:

1)What's a data line of input HTML file? Do options lrecl=1 recfm=m in infile statement let input statement read atmost one character each time?

2)How Does input statement read a Chinese character into variable x, Since  the informt $char1. makes it read one character eacht time? 

3) I want create sas data set using the code 

filename in url 'https://www.zaobao.com/realtime/china/story20211010-1201937';
filename out 'd:\mydata\myfile.txt';
data  dst;
infile in lrecl=1 recfm=f end=eof;
file out lrecl=1 recfm=f;
input @1 x $char1.;
put @1 x $char1.;
if eof;
call symputx('filesize',_n_);
run;

The SAS dataset dst only has on observation with missing value. Why?

1 ACCEPTED SOLUTION
3 REPLIES 3
Kurt_Bremser
Super User

RECFM=F means records of a fixed length, any line-ending character or sequence (LF, CR, CRLF) is disregarded and in fact read as bytes.

LRECL=1 means to always read one character as one record. UTF characters will be read peacemeal, each of their constituent bytes separately.

Your code defines a boolean variable (eof) for the end of input data which will be set to true when the last byte is read. Since you use it in a Subsetting IF, only that byte will end up in the dataset. But your code will write all web data to the text file (because the INPUT/PUT happen before the subsetting IF).

Steve1964
Obsidian | Level 7

Thanks!

 Another question:

To my knowledge, input statement reads one character of the current dataline in current data-step loop,and read one character of the next dataline in the next loop without a double trailing @@. Is it right that input statement take each character as a seperate dataline since  any line-ending character or sequence (LF, CR, CRLF) is disregarded ?

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 6308 views
  • 0 likes
  • 2 in conversation