BookmarkSubscribeRSS Feed
IceFire
Calcite | Level 5

Hi all,

 

After I imported the stacked 1-row data (1.8 Mn rows) into SAS and ran the below script to define the respective value for each column, I noticed that the output data is having 936 rows more than the imported data. 

 

As I just follow the script that my colleague left to me before she has her maternity leave, please help advice on whether there is any problem  on the script? Thank you.

(@ has been replaced by (a))

 

data input_file;
infile 'imported_file' truncover lrecl = 923;
input
(a)1 case_no $13.
(a)14 acct $17
(a)31 patient_id $34.
(a)65 ptype $1
(a)66 adate ddmmyy10
(a)76 ddate ddmmyy10.
(a)86 alos $4.
(a)90 bdate ddmmyy10.
(a)100 agey $3.
(a)103 aged $3.
(a)106 filler1 $
(a)109 sex $1.
(a)110 dstat $2.
(a)112 bwt $4.
(a)116 diag01 $10. (a)126 diag02 $10. (a)136 diag03 $10. (a)146 diag04 $10. (a)156 diag05 $10. (a)166 diag06 $10. (a)176 diag07 $10. (a)186 diag08 $10. (a)196 diag09 $10. (a)206 diag10 $10. (a)216 diag11 $10. (a)226 diag12 $10. (a)236 diag13 $10. (a)246 diag14 $10. (a)256 diag15 $10. (a)266 diag16 $10. (a)276 diag17 $10. (a)286 diag18 $10. (a)296 diag19 $10. (a)306 diag20 $10. (a)316 diag21 $10. (a)326 diag22 $10. (a)336 diag23 $10. (a)346 diag24 $10. (a)356 diag25 $10. (a)366 diag26 $10. (a)376 diag27 $10. (a)386 diag28 $10. (a)396 diag29 $10. (a)406 diag30 $10. (a)416 oper01 $10. (a)426 oper02 $10. (a)436 oper03 $10. (a)446 oper04 $10. (a)456 oper05 $10. (a)466 oper06 $10. (a)476 oper07 $10. (a)486 oper08 $10. (a)496 oper09 $10. (a)506 oper10 $10. (a)516 oper11 $10. (a)526 oper12 $10. (a)536 oper13 $10. (a)546 oper14 $10. (a)556 oper15 $10. (a)566 oper16 $10. (a)576 oper17 $10. (a)586 oper18 $10. (a)596 oper19 $10. (a)606 oper20 $10. (a)616 oper21 $10. (a)626 oper22 $10. (a)636 oper23 $10. (a)646 oper24 $10. (a)656 oper25 $10. (a)666 oper26 $10. (a)676 oper27 $10. (a)686 oper28 $10. (a)696 oper29 $10. (a)706 oper30 $10.
(a)716 GrouperVersion $17.
(a)733 drg $6.
(a)739 Pat_ROM $1.
(a)740 Pat_SOI $1.
(a)741 Gstatus $2.
(a)743 RW $7.
(a)750 ELOS $5.
(a)755 EhighTrim $3.
(a)758 ElowTrim $3.
(a)761 dxv1_30 $30.
(a)791 Dx_ROM $30.
(a)821 Dx_SOI $30.
(a)851 px_vflag $30.
(a)881 u_pttype $1.
(a)882 px_class $30.
(a)912 newline$2.
;
format adate ddate bdate ddmmyy10.;
run;

4 REPLIES 4
IceFire
Calcite | Level 5

Below is the log from SAS system. The total rows for "import_file" is 1855371

 

ODS _ALL_ CLOSE;
OPTIONS DEV=PNG;
GOPTIONS XPIXELS=0 YPIXELS=0;
FILENAME EGSR TEMP;
ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR
STYLE=HTMLBlue
STYLESHEET=(URL="file:///C:/Program%20Files/SASHome/SASEnterpriseGuide/7.1/Styles/HTMLBlue.css")
NOGTITLE
NOGFOOTNOTE
GPATH=&sasworklocation
ENCODING=UTF8
options(rolap="on") ;
NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
GOPTIONS ACCESSIBLE;
data input_file;
infile 'imported_file' truncover lrecl = 923;
input
(a)1 case_no $13.
(a)14 acct $17
(a)31 patient_id $34.
(a)65 ptype $1
(a)66 adate ddmmyy10
(a)76 ddate ddmmyy10.
(a)86 alos $4.
(a)90 bdate ddmmyy10.
(a)100 agey $3.
(a)103 aged $3.
(a)106 filler1 $
(a)109 sex $1.
(a)110 dstat $2.
(a)112 bwt $4.
(a)116 diag01 $10. (a)126 diag02 $10. (a)136 diag03 $10. (a)146 diag04 $10. (a)156 diag05 $10. (a)166 diag06 $10. (a)176 diag07 $10. (a)186 diag08 $10. (a)196 diag09 $10. (a)206 diag10 $10. (a)216 diag11 $10. (a)226 diag12 $10. (a)236 diag13 $10. (a)246 diag14 $10. (a)256 diag15 $10. (a)266 diag16 $10. (a)276 diag17 $10. (a)286 diag18 $10. (a)296 diag19 $10. (a)306 diag20 $10. (a)316 diag21 $10. (a)326 diag22 $10. (a)336 diag23 $10. (a)346 diag24 $10. (a)356 diag25 $10. (a)366 diag26 $10. (a)376 diag27 $10. (a)386 diag28 $10. (a)396 diag29 $10. (a)406 diag30 $10. (a)416 oper01 $10. (a)426 oper02 $10. (a)436 oper03 $10. (a)446 oper04 $10. (a)456 oper05 $10. (a)466 oper06 $10. (a)476 oper07 $10. (a)486 oper08 $10. (a)496 oper09 $10. (a)506 oper10 $10. (a)516 oper11 $10. (a)526 oper12 $10. (a)536 oper13 $10. (a)546 oper14 $10. (a)556 oper15 $10. (a)566 oper16 $10. (a)576 oper17 $10. (a)586 oper18 $10. (a)596 oper19 $10. (a)606 oper20 $10. (a)616 oper21 $10. (a)626 oper22 $10. (a)636 oper23 $10. (a)646 oper24 $10. (a)656 oper25 $10. (a)666 oper26 $10. (a)676 oper27 $10. (a)686 oper28 $10. (a)696 oper29 $10. (a)706 oper30 $10.
(a)716 GrouperVersion $17.
(a)733 drg $6.
(a)739 Pat_ROM $1.
(a)740 Pat_SOI $1.
(a)741 Gstatus $2.
(a)743 RW $7.
(a)750 ELOS $5.
(a)755 EhighTrim $3.
(a)758 ElowTrim $3.
(a)761 dxv1_30 $30.
(a)791 Dx_ROM $30.
(a)821 Dx_SOI $30.
(a)851 px_vflag $30.
(a)881 u_pttype $1.
(a)882 px_class $30.
(a)912 newline$2.

;
format adate ddate bdate ddmmyy10.;
run;

 

NOTE: The infile 'imported_file' is:
Filename=imported_file,
Access Permission=-rw-r--r--,
Last Modified=27Dec2024:17:59:32,
File Size (bytes)=1642040648

NOTE: 1856307 records were read from the infile 'imported_file'.
The minimum record length was 881.
The maximum record length was 911.
NOTE: The data set WORK.input_file has 1856307 observations and 90 variables.
NOTE: Compressing data set WORK.input_file decreased size by 83.77 percent.
Compressed is 4243 pages; un-compressed would require 26146 pages.
NOTE: DATA statement used (Total process time):
real time 6.45 seconds
cpu time 6.17 seconds


GOPTIONS NOACCESSIBLE;
%LET _CLIENTTASKLABEL=;
%LET _CLIENTPROCESSFLOWNAME=;
%LET _CLIENTPROJECTPATH=;
%LET _CLIENTPROJECTPATHHOST=;
%LET _CLIENTPROJECTNAME=;
%LET _SASPROGRAMFILE=;
%LET _SASPROGRAMFILEHOST=;
 ;*';*";*/;quit;run;
ODS _ALL_ CLOSE;

QUIT; RUN;

mkeintz
PROC Star

 


@IceFire wrote:

Below is the log from SAS system. The total rows for "import_file" is 1855371

How do you know that the total row count is 1,855,371, given that your log below reports 1,856,307 records?  (I presume that"import_file" above refers to the same file as "imported_file" below).

 


NOTE: The infile 'imported_file' is:
Filename=imported_file,
Access Permission=-rw-r--r--,
Last Modified=27Dec2024:17:59:32,
File Size (bytes)=1642040648

NOTE: 1856307 records were read from the infile 'imported_file'.



 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

Usually the reason for finding more LINES in a text file than you thought were written to the file is because one or more of the variables written to the file contained end of line characters in them.

 

Sometimes you can solve this easily in SAS by using the TERMSTR= option of the INFILE statement.  But that only works when the real end of lines are marked by CRLF combination and the embedded character strings only contain single CR or single LF characters.  In that case you can use TERMSTR=CRLF on the INFILE statement.  Try it

infile 'imported_file' truncover lrecl = 923 termstr=crlf ;

Another possibility suggested by you use of the LRECL= option set to such a strange number is that perhaps you did the same thing when writing the file but some of the lines being written needed more than 923 characters.  That would have caused SAS to move to a new line when writing those values.

 

You can use the LENGTH= infile option if you want to try and do some analysis on the lengths of the lines in your text file.

data lengths;
  infile 'imported_file' length=ll;
  input;
  row+1;
  length=ll;
run;
proc means n min max mean;
  var length;
run;

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 609 views
  • 0 likes
  • 4 in conversation