BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jpprovost
Quartz | Level 8

Hi there,

 

I was looking for a way to verify that a file is "correctly" imported into SAS. I have a SAS program that do data steps (with infile) to import data from a .txt or a .csv file. However, the program is running "from a long time" now and I was thinking today : if a new column is added to the file (because many files are produced (aka dumped) automatically by servers), how would I be aware of it?

 

Since the SAS program specify the position, field name and format, how could I set-up like a "flag" to say "Hey, there's more column in the file that you set in the infile"?

 

TL;DR: Is there a way to get the structure of the file (e.g. number of headers and lines) before importing it?

 

(I don't know if the options GUESSINGROWS=1 is appropriate in this case).

 

I found this post, but I don't understand if it could be a solution to my question. 

https://communities.sas.com/t5/SAS-Programming/Reading-header-and-trailer-data-from-text-file/td-p/5... 

 

As always, thanks in advance for your kindness.

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Could you just read the first row (with the headers) and count the number of columns?

View solution in original post

15 REPLIES 15
ChrisNZ
Tourmaline | Level 20

Could you just read the first row (with the headers) and count the number of columns?

jpprovost
Quartz | Level 8

@ChrisNZ wrote:

Could you just read the first row (with the headers) and count the number of columns?


Gosh. I was thinking so "out-of-the-box" that I realize, with your reply, how "easy" it could be. My bad (I'm new with SAS).

Is there a way to specify to SAS to only "infile" the two-first rows?

 

EDIT: I found that I could use the OBS options (e.g. OBS=2). Sorry for the inconveniance @ChrisNZ !

ChrisNZ
Tourmaline | Level 20

Even one row is enough.

Pseudo code:

data T;
  infile F dlm=',';
  input;
  call symputx('NB_COLS', countw(_INFILE_,',') );
  stop;
run;

 

 

jpprovost
Quartz | Level 8

Will try it for sure tomorrow moring.

What is this option ?

countw

 

ScottBass
Rhodochrosite | Level 12

Various approaches:

 

filename temp temp;

* create csv ;
proc export data=sashelp.class file=temp dbms=csv;
run;

* check csv ;
data _null_;
   infile temp;
   input;
   put _infile_;
run;

* option 1: count delimiters in the header ;
* this assumes there is no edge case of a quoted delimiter in the header ;
data _null_;
   infile temp;
   input;
   call symputx('num_cols',countc(_infile_,',')+1,'G');
   stop;
run;

%put &=num_cols;

* option 2: import the header and count columns ;
proc import file=temp out=class dbms=csv replace;
   getnames=yes;
run;

data _null_;
   if 0 then set class;
   length varname $32;
   do until (varname eq 'varname');
      num_cols+1;
      call vnext(varname);
   end;
   call symputx('num_cols',num_cols-1,'G');
   stop;
run;

%put &=num_cols;

* option 3: dictionary tables ;
* this can have a performance issue if you have a lot of allocated librefs esp. RBDMS ;
proc sql noprint;
   select count(0) into :num_cols trimmed
   from dictionary.columns
   where upcase(libname)='WORK' and upcase(memname)='CLASS';
quit;

%put &=num_cols;

* option 4: proc contents ;
proc contents data=class out=columns (keep=name) noprint;
run;

%let num_cols=%nobs(columns);
%put &=num_cols;

What you do with &num_cols in your downstream code is up to you.

 

I once had to do something similar, where I had to read a delimited file created by a web service.  The structure could be changed by the customer via a web interface. 

 

In that scenario, there was a max number of columns available from the web service, along with known attributes.  But the customer could choose a subset, as well as reorder the columns.

 

I created a metadata table in the code via datalines (the metadata was a bit like proc contents output).  I used approach #1 to get the list of column names from the header, as well as the order of the columns (a sort column).  I then inner joined with the metadata to get the known column attributes and the correct order from the header itself, then generated the correct attrib statements and code to import the dynamic data.

 

HTH...


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
ScottBass
Rhodochrosite | Level 12

@ScottBass wrote:

I once had to do something similar, where I had to read a delimited file created by a web service.  The structure could be changed by the customer via a web interface. 

 

In that scenario, there was a max number of columns available from the web service, along with known attributes.  But the customer could choose a subset, as well as reorder the columns.

 

I created a metadata table in the code via datalines (the metadata was a bit like proc contents output).  I used approach #1 to get the list of column names from the header, as well as the order of the columns (a sort column).  I then inner joined with the metadata to get the known column attributes and the correct order from the header itself, then generated the correct attrib statements and code to import the dynamic data.

 

HTH...


 

Something like this:

 

data class;
   * reorder columns ;
   format sex name weight height age;
   set sashelp.class;
run;

* create metadata (pretend this was via datalines) ;
proc contents data=sashelp.class out=metadata (keep=name type length label) noprint;
run;

* create a CSV with a subset of the columns ;
data subset / view=subset;
   set class;
   keep name age sex;
run;

filename temp temp;
proc export data=subset file=temp dbms=csv replace;
run;

* get the structure of this csv file ;
data columns;
   length name $32;
   infile temp;
   input;
   do i=1 to countc(_infile_,',')+1;
      name=scan(_infile_,i,',');
      sort+1;
      output;
   end;
   keep name sort;
   stop;
run;

proc sql;
   create table attribs as
   select c.name, m.type, m.length, m.label
   from metadata m
   join columns c
   on upcase(m.name)=upcase(c.name)
   order by c.sort;
quit;

%macro attribs;
%let label=%trim(&label);
attrib &name
length=%if (&type eq 2) %then $&length; %else &length; 
label="&label"
;
%mend;

data final;
   %loop_control(control=attribs,mname=attribs);
   infile temp dlm=',' dsd firstobs=2;
   input (_all_) (:);
run;

https://github.com/scottbass/SAS/blob/master/Macro/loop_control.sas


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
jpprovost
Quartz | Level 8

@ScottBass wrote:

@ScottBass wrote:

I once had to do something similar, where I had to read a delimited file created by a web service.  The structure could be changed by the customer via a web interface. 

 

In that scenario, there was a max number of columns available from the web service, along with known attributes.  But the customer could choose a subset, as well as reorder the columns.

 

I created a metadata table in the code via datalines (the metadata was a bit like proc contents output).  I used approach #1 to get the list of column names from the header, as well as the order of the columns (a sort column).  I then inner joined with the metadata to get the known column attributes and the correct order from the header itself, then generated the correct attrib statements and code to import the dynamic data.

 

HTH...


 

Something like this:

 

data class;
   * reorder columns ;
   format sex name weight height age;
   set sashelp.class;
run;

* create metadata (pretend this was via datalines) ;
proc contents data=sashelp.class out=metadata (keep=name type length label) noprint;
run;

* create a CSV with a subset of the columns ;
data subset / view=subset;
   set class;
   keep name age sex;
run;

filename temp temp;
proc export data=subset file=temp dbms=csv replace;
run;

* get the structure of this csv file ;
data columns;
   length name $32;
   infile temp;
   input;
   do i=1 to countc(_infile_,',')+1;
      name=scan(_infile_,i,',');
      sort+1;
      output;
   end;
   keep name sort;
   stop;
run;

proc sql;
   create table attribs as
   select c.name, m.type, m.length, m.label
   from metadata m
   join columns c
   on upcase(m.name)=upcase(c.name)
   order by c.sort;
quit;

%macro attribs;
%let label=%trim(&label);
attrib &name
length=%if (&type eq 2) %then $&length; %else &length; 
label="&label"
;
%mend;

data final;
   %loop_control(control=attribs,mname=attribs);
   infile temp dlm=',' dsd firstobs=2;
   input (_all_) (:);
run;

https://github.com/scottbass/SAS/blob/master/Macro/loop_control.sas


I will keep this in my bookmark for further usage. Very useful.

 

I just open some files dumped by the server and I realize that there's no headers. I take a look at the code (made by someone else) and I realize that field names are user-defined. Since the file do not have any headers, do you think I could use le length of the first line as an indicator of "columns"?

 

By example, the code for loading thoses files looks like :

 

data file1 ;
  infile myfile.txt MISSOVER LRECL = 153 ;
  input
	@1   Field1    10.
	@11  Field2    $15.
	@26  Field3   $1.
	@27  Field4  $1.
	@28  Field5   $2.
	@30  Field6   $6.
	@36  Field7   $60.
	@96  Field8   $2.
	@98  Field9   9.
	@107 Field10   13.
	@120 Field11   11.
	@131 Field12   12.
	@143 Field13   $10.
	@153 Field14  $1.
	@154 Field15  $3.
	;

run ;

Thanks for the help!

Tom
Super User Tom
Super User

For that kind of structure you will probably want to check that the length of the lines is appropriate.  For example your posted code is truncating the input lines at 153 bytes and then trying to read data up to byte number 156.

 

Also check key variables to make sure they have valid values. For example FIELD14 might only allow values of 'A' or 'B'.  If something else appears in byte number 153 of the line then it is a good indication that the file is not right.

 

PS: Never use the antiquated MISSOVER option when using formatted input.  If your input line had only 155 bytes your input of FIELD15 would be missing, even if bytes 154 and 155 where not blanks.  Use the more robust TRUNCOVER option instead.

jpprovost
Quartz | Level 8
How to check the length of a line?
Tom
Super User Tom
Super User

There is an INFILE option that will set a variable with the length of the line.  You can also test the _INFILE_ automatic variable.

infile 'myfile.txt' length=line_length1 ;
input @;
line_length2=length(_infile_);
line_length3=lengthn(_infile_);

The difference between the two methods is the first one will count any trailing spaces on the line.  The difference between the LENGTH() and LENGTHN() functions is when the line only contains blanks.  In that case LENGTH() will return 1 and LENGTHN() will return 0.

jpprovost
Quartz | Level 8

@Tom wrote:

There is an INFILE option that will set a variable with the length of the line.  You can also test the _INFILE_ automatic variable.

infile 'myfile.txt' length=line_length1 ;
input @;
line_length2=length(_infile_);
line_length3=lengthn(_infile_);

The difference between the two methods is the first one will count any trailing spaces on the line.  The difference between the LENGTH() and LENGTHN() functions is when the line only contains blanks.  In that case LENGTH() will return 1 and LENGTHN() will return 0.


 

Hi @Tom ,

 

I tried to do this :

data t;
infile myfile.txt truncover;
input @;
run;

And it returns no observations, but the obs columns seems to count the total of "lines".

 

I tried your code, and it gives me all 155 in both columns.

 

However, I was wondering what the is the usage of this :

length=line_length1 ;  

 

Tom
Super User Tom
Super User
Sounds like it did what you asked it to do then.
jpprovost
Quartz | Level 8
Yes, a big thanks to you Tom.
But I don't understand what the length=line_length1 does?

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 2798 views
  • 4 likes
  • 4 in conversation