BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
A_Kh
Barite | Level 11

Dear Community,

 

I'm wondering if there is a way to read a specific part of a text file into SAS. Below is an example from the data, where I need to read only 1 to 26 rows under P(#). The remaining part is unnecessary. I could read it using proc import then reading 26 obs in the next data step, but when there are hundreds of files and observation numbers (to read) change at each run, this requires more manual effort. I'm looking for more optimal technique that allows to read data maybe based on data patterns.. Any idea or tips would be appreciated. Thank you!

P(#)         Est          SE        Grad
     1        0.30        0.05        0.02
     2        1.05        0.07       -0.01
     3        2.88        0.11        0.01
     4        1.23        0.11       -0.01
     5       -0.05        0.04        0.02
     6        0.84        0.06       -0.01
     7        0.94        0.05        0.02
     8        1.04        0.07       -0.01
     9        0.33        0.04        0.02
    10        0.85        0.06       -0.01
    11        2.09        0.09        0.02
    12        1.34        0.10       -0.02
    13        4.29        0.21        0.01
    14        1.90        0.17       -0.02
    15        1.51        0.06        0.02
    16        0.97        0.08       -0.01
    17        3.35        0.16        0.02
    18        1.89        0.15       -0.02
    19        2.40        0.09        0.02
    20        1.32        0.10       -0.02
    21       -0.26        0.04        0.02
    22        0.87        0.06       -0.01
    23        0.15        0.05        0.02
    24        1.01        0.07       -0.01
    25        0.00          --
    26        1.00          --
-2*log(LL) = 33408.05
 #Cycles    A-time    E-time    D-time    M-time    S-time     Total
      26      0.00      0.10      0.00      0.00      0.12      0.22

Parameter Segments
 Segment 1:
  Items= 1
  Parms= 1 2
 Segment 2:
  Items= 2
  Parms= 3 4
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You can use the ?? input modifier to suppress the messages about invalid data.

data want;
  infile sample firstobs=2 truncover;
  input (P Est SE Grad) (??);
  if missing(p) then stop;
run;

You could also just read ALL of the files in one data step.

data all;
  length fileno 8 filename $256 ;
  do fileno=1 to 31 ;
    filename = cats("&newfolder\",symget(cats('file',fileno)),'.txt');
    infile text filevar=filename firstobs=2 end=eof;
    p=0;
    do while(not missing(p) and not eof);
      input (P Est SE Grad) (??);
      if not missing(p) then output;
    end;
  end;
run;

View solution in original post

8 REPLIES 8
Tom
Super User Tom
Super User

There is no reason to use PROC IMPORT to read a file with only 4 variables.  You can write the data step in less code than it would take to write the PROC IMPORT code. And you then have complete control over how it is read.

 

First let's convert your example back into a physical file:

Spoiler
options parmcards=sample;
filename sample temp;

parmcards4;
P(#)         Est          SE        Grad
     1        0.30        0.05        0.02
     2        1.05        0.07       -0.01
     3        2.88        0.11        0.01
     4        1.23        0.11       -0.01
     5       -0.05        0.04        0.02
     6        0.84        0.06       -0.01
     7        0.94        0.05        0.02
     8        1.04        0.07       -0.01
     9        0.33        0.04        0.02
    10        0.85        0.06       -0.01
    11        2.09        0.09        0.02
    12        1.34        0.10       -0.02
    13        4.29        0.21        0.01
    14        1.90        0.17       -0.02
    15        1.51        0.06        0.02
    16        0.97        0.08       -0.01
    17        3.35        0.16        0.02
    18        1.89        0.15       -0.02
    19        2.40        0.09        0.02
    20        1.32        0.10       -0.02
    21       -0.26        0.04        0.02
    22        0.87        0.06       -0.01
    23        0.15        0.05        0.02
    24        1.01        0.07       -0.01
    25        0.00          --
    26        1.00          --
-2*log(LL) = 33408.05
 #Cycles    A-time    E-time    D-time    M-time    S-time     Total
      26      0.00      0.10      0.00      0.00      0.12      0.22

Parameter Segments
 Segment 1:
  Items= 1
  Parms= 1 2
 Segment 2:
  Items= 2
  Parms= 3 4
;;;;

If you know the file always has at least 26 lines of data and you only want the first 26 then you can use the OBS= option:

data want;
  infile sample firstobs=2 obs=27 truncover;
  input p1-p4;
run;

If the number of lines varies then you could probably decide when to stop or what observations to write out based on the value read for the first column.

 

A_Kh
Barite | Level 11

Hi @Tom ,

Thank you for your answer, it is very helpful! As I never used infile statement to read data before, don't know enough about it's power. What would be the code for reading the ready file (C:\Users\Files\dbg.text) in your example? 
The number of observations vary per file, but data sample won't (eg. about first  25-50 obs are numeric values ordered in 4 columns, followed by obs containing unorganized lines of texts). It would be ideal to know also how to stop reading data once numeric values ends.   

Kurt_Bremser
Super User

Read until you find a trigger to stop:

data want;
infile "path to your file" firstobs=2;
input @;
if index(_infile_,"=") then stop;
input p_ est se grad;
run;

The step will skip the header line and terminate when an equal sign is detected in the infile.

A_Kh
Barite | Level 11

Thank you, @Kurt_Bremser !

This is giving the desired output, but with multiple _ERROR_ in the log, due to different data type. Starting from row 27 SE variable has only -- (double dash) which is causing the error. Could this error be avoided by using any INFILE statement options?

P(#) Est SE Grad
1 -0.91 0.04 0.00
2 -1.01 0.04 0.00
3 -0.96 0.04 0.00
.
.
25 0.98 0.85 0.00
26 -1.30 1.32 0.00
27 0.00 --
28 0.00 --
29 0.00 --
30 0.00 --
.
.
89 0.00 --
90 0.00 --
91 1.00 --
-2*log(LL) = 85741.39
#Cycles A-time E-time D-time M-time S-time Total
535 0.00 2.69 0.00 0.48 0.61 3.78


Below is the SAS code and it's log. 

%macro import;
	%do i= 1 %to 31;
		filename dbg_new "&newfolder.\&&file&i...txt";
		data new_%scan(&&file&i, -1, -)&i;
			infile dbg_new firstobs=2 truncover;
			input @;
			if index(_infile_,"=") then stop;
			input p_ est se grad;
		run;
	%end;
%mend; 

%import;

 

NOTE: Invalid data for se in line 28 29-30.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+--
28            27        0.00          -- 30
p_=27 est=0 se=. grad=. _ERROR_=1 _INFILE_=27        0.00          -- _N_=27
NOTE: Invalid data for se in line 29 29-30.
29            28        0.00          -- 30
p_=28 est=0 se=. grad=. _ERROR_=1 _INFILE_=28        0.00          -- _N_=28
NOTE: Invalid data for se in line 30 29-30.
30            29        0.00          -- 30
p_=29 est=0 se=. grad=. _ERROR_=1 _INFILE_=29        0.00          -- _N_=29
NOTE: Invalid data for se in line 31 29-30.
31            30        0.00          -- 30
p_=30 est=0 se=. grad=. _ERROR_=1 _INFILE_=30        0.00          -- _N_=30
NOTE: Invalid data for se in line 32 29-30.
32            31        0.00          -- 30
p_=31 est=0 se=. grad=. _ERROR_=1 _INFILE_=31        0.00          -- _N_=31
NOTE: Invalid data for se in line 33 29-30.
33            32        0.00          -- 30
p_=32 est=0 se=. grad=. _ERROR_=1 _INFILE_=32        0.00          -- _N_=32
NOTE: Invalid data for se in line 34 29-30.
34            33        0.00          -- 30
p_=33 est=0 se=. grad=. _ERROR_=1 _INFILE_=33        0.00          -- _N_=33
NOTE: Invalid data for se in line 35 29-30.
35            34        0.00          -- 30
p_=34 est=0 se=. grad=. _ERROR_=1 _INFILE_=34        0.00          -- _N_=34
NOTE: Invalid data for se in line 36 29-30.
36            35        0.00          -- 30
p_=35 est=0 se=. grad=. _ERROR_=1 _INFILE_=35        0.00          -- _N_=35
NOTE: Invalid data for se in line 37 29-30.
37            36        0.00          -- 30
p_=36 est=0 se=. grad=. _ERROR_=1 _INFILE_=36        0.00          -- _N_=36
NOTE: Invalid data for se in line 38 29-30.
38            37        0.00          -- 30
p_=37 est=0 se=. grad=. _ERROR_=1 _INFILE_=37        0.00          -- _N_=37
NOTE: Invalid data for se in line 39 29-30.
39            38        0.00          -- 30
p_=38 est=0 se=. grad=. _ERROR_=1 _INFILE_=38        0.00          -- _N_=38
NOTE: Invalid data for se in line 40 29-30.
40            39        0.00          -- 30
p_=39 est=0 se=. grad=. _ERROR_=1 _INFILE_=39        0.00          -- _N_=39
NOTE: Invalid data for se in line 41 29-30.
41            40        0.00          -- 30
p_=40 est=0 se=. grad=. _ERROR_=1 _INFILE_=40        0.00          -- _N_=40
NOTE: Invalid data for se in line 42 29-30.
42            41        0.00          -- 30
p_=41 est=0 se=. grad=. _ERROR_=1 _INFILE_=41        0.00          -- _N_=41
NOTE: Invalid data for se in line 43 29-30.
43            42        0.00          -- 30
p_=42 est=0 se=. grad=. _ERROR_=1 _INFILE_=42        0.00          -- _N_=42
NOTE: Invalid data for se in line 44 29-30.
44            43        0.00          -- 30
p_=43 est=0 se=. grad=. _ERROR_=1 _INFILE_=43        0.00          -- _N_=43
NOTE: Invalid data for se in line 45 29-30.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+--
45            44        0.00          -- 30
p_=44 est=0 se=. grad=. _ERROR_=1 _INFILE_=44        0.00          -- _N_=44
NOTE: Invalid data for se in line 46 29-30.
46            45        0.00          -- 30
p_=45 est=0 se=. grad=. _ERROR_=1 _INFILE_=45        0.00          -- _N_=45
NOTE: Invalid data for se in line 47 29-30.
WARNING: Limit set by ERRORS= option reached.  Further errors of this type will not be
         printed.
47            46        0.00          -- 30
p_=46 est=0 se=. grad=. _ERROR_=1 _INFILE_=46        0.00          -- _N_=46
NOTE: 92 records were read from the infile DBG_NEW.
      The minimum record length was 21.
      The maximum record length was 42.
NOTE: The data set WORK.NEW_DBG31 has 91 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

.. 

Tom
Super User Tom
Super User

You can use the ?? input modifier to suppress the messages about invalid data.

data want;
  infile sample firstobs=2 truncover;
  input (P Est SE Grad) (??);
  if missing(p) then stop;
run;

You could also just read ALL of the files in one data step.

data all;
  length fileno 8 filename $256 ;
  do fileno=1 to 31 ;
    filename = cats("&newfolder\",symget(cats('file',fileno)),'.txt');
    infile text filevar=filename firstobs=2 end=eof;
    p=0;
    do while(not missing(p) and not eof);
      input (P Est SE Grad) (??);
      if not missing(p) then output;
    end;
  end;
run;
A_Kh
Barite | Level 11

Thank you @Tom  and  @Kurt_Bremser !

I appreciate your support, this way of reading data in SAS is something fundamental that I should learn. 

@Tom , regarding the second part of the code, each file (31 files) is located in a separate folder with a different names. Only common part is .txt extension for all files(different file names as well). Therefore, I've hardcoded the list of files in earlier steps into &file1-&file31.. This part is something I could handle by myself, but thank you so much for your input!

 

Tom
Super User Tom
Super User

@A_Kh wrote:

Thank you @Tom  and  @Kurt_Bremser !

I appreciate your support, this way of reading data in SAS is something fundamental that I should learn. 

@Tom , regarding the second part of the code, each file (31 files) is located in a separate folder with a different names. Only common part is .txt extension for all files(different file names as well). Therefore, I've hardcoded the list of files in earlier steps into &file1-&file31.. This part is something I could handle by myself, but thank you so much for your input!

 


Note that it is easier to just put the list of names into a DATASET instead of bothering to try to figure out how to use the macro language to generate code.

data files;
  infile cards truncover ;
  input filename $256. ;
cards;
filename1.txt
filename2.txt
;

data want;
  set files;
  filevar=filename;
  infile dummy firstobs=2 truncover filevar=filevar end=eof;
  do while (not eof);
      input ....
      output;
  end;
run;

You could even combine the two data steps into one if you wanted.

A_Kh
Barite | Level 11

Awesome! This really works. 

Thank you so much, @Tom  and @Kurt_Bremser !

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 3120 views
  • 5 likes
  • 3 in conversation