BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Steve1964
Obsidian | Level 7

I use a segment of code from the paper 052-2009  written by Rick Langston in SAS Global Forum 2009 to creating text file from HTML. The codes read

filename in url 'https://www.zaobao.com/realtime/china/story20211010-1201937';
filename out 'd:\mydata\myfile.txt';
data _null_;
infile in lrecl=1 recfm=f end=eof;
file out lrecl=1 recfm=f;
input @1 x $char1.;
put @1 x $char1.;
if eof;
call symputx('filesize',_n_);
run;

I'm puzzled by then followings:

1)What's a data line of input HTML file? Do options lrecl=1 recfm=m in infile statement let input statement read atmost one character each time?

2)How Does input statement read a Chinese character into variable x, Since  the informt $char1. makes it read one character eacht time? 

3) I want create sas data set using the code 

filename in url 'https://www.zaobao.com/realtime/china/story20211010-1201937';
filename out 'd:\mydata\myfile.txt';
data  dst;
infile in lrecl=1 recfm=f end=eof;
file out lrecl=1 recfm=f;
input @1 x $char1.;
put @1 x $char1.;
if eof;
call symputx('filesize',_n_);
run;

The SAS dataset dst only has on observation with missing value. Why?

1 ACCEPTED SOLUTION
3 REPLIES 3
Kurt_Bremser
Super User

RECFM=F means records of a fixed length, any line-ending character or sequence (LF, CR, CRLF) is disregarded and in fact read as bytes.

LRECL=1 means to always read one character as one record. UTF characters will be read peacemeal, each of their constituent bytes separately.

Your code defines a boolean variable (eof) for the end of input data which will be set to true when the last byte is read. Since you use it in a Subsetting IF, only that byte will end up in the dataset. But your code will write all web data to the text file (because the INPUT/PUT happen before the subsetting IF).

Steve1964
Obsidian | Level 7

Thanks!

 Another question:

To my knowledge, input statement reads one character of the current dataline in current data-step loop,and read one character of the next dataline in the next loop without a double trailing @@. Is it right that input statement take each character as a seperate dataline since  any line-ending character or sequence (LF, CR, CRLF) is disregarded ?

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 4340 views
  • 0 likes
  • 2 in conversation