Hi there,
I was looking for a way to verify that a file is "correctly" imported into SAS. I have a SAS program that do data steps (with infile) to import data from a .txt or a .csv file. However, the program is running "from a long time" now and I was thinking today : if a new column is added to the file (because many files are produced (aka dumped) automatically by servers), how would I be aware of it?
Since the SAS program specify the position, field name and format, how could I set-up like a "flag" to say "Hey, there's more column in the file that you set in the infile"?
TL;DR: Is there a way to get the structure of the file (e.g. number of headers and lines) before importing it?
(I don't know if the options GUESSINGROWS=1 is appropriate in this case).
I found this post, but I don't understand if it could be a solution to my question.
As always, thanks in advance for your kindness.
Could you just read the first row (with the headers) and count the number of columns?
Could you just read the first row (with the headers) and count the number of columns?
@ChrisNZ wrote:
Could you just read the first row (with the headers) and count the number of columns?
Gosh. I was thinking so "out-of-the-box" that I realize, with your reply, how "easy" it could be. My bad (I'm new with SAS).
Is there a way to specify to SAS to only "infile" the two-first rows?
EDIT: I found that I could use the OBS options (e.g. OBS=2). Sorry for the inconveniance @ChrisNZ !
Even one row is enough.
Pseudo code:
data T;
infile F dlm=',';
input;
call symputx('NB_COLS', countw(_INFILE_,',') );
stop;
run;
Will try it for sure tomorrow moring.
What is this option ?
countw
Look up this function, it counts words.
Various approaches:
filename temp temp;
* create csv ;
proc export data=sashelp.class file=temp dbms=csv;
run;
* check csv ;
data _null_;
infile temp;
input;
put _infile_;
run;
* option 1: count delimiters in the header ;
* this assumes there is no edge case of a quoted delimiter in the header ;
data _null_;
infile temp;
input;
call symputx('num_cols',countc(_infile_,',')+1,'G');
stop;
run;
%put &=num_cols;
* option 2: import the header and count columns ;
proc import file=temp out=class dbms=csv replace;
getnames=yes;
run;
data _null_;
if 0 then set class;
length varname $32;
do until (varname eq 'varname');
num_cols+1;
call vnext(varname);
end;
call symputx('num_cols',num_cols-1,'G');
stop;
run;
%put &=num_cols;
* option 3: dictionary tables ;
* this can have a performance issue if you have a lot of allocated librefs esp. RBDMS ;
proc sql noprint;
select count(0) into :num_cols trimmed
from dictionary.columns
where upcase(libname)='WORK' and upcase(memname)='CLASS';
quit;
%put &=num_cols;
* option 4: proc contents ;
proc contents data=class out=columns (keep=name) noprint;
run;
%let num_cols=%nobs(columns);
%put &=num_cols;
What you do with &num_cols in your downstream code is up to you.
I once had to do something similar, where I had to read a delimited file created by a web service. The structure could be changed by the customer via a web interface.
In that scenario, there was a max number of columns available from the web service, along with known attributes. But the customer could choose a subset, as well as reorder the columns.
I created a metadata table in the code via datalines (the metadata was a bit like proc contents output). I used approach #1 to get the list of column names from the header, as well as the order of the columns (a sort column). I then inner joined with the metadata to get the known column attributes and the correct order from the header itself, then generated the correct attrib statements and code to import the dynamic data.
HTH...
@ScottBass wrote:
I once had to do something similar, where I had to read a delimited file created by a web service. The structure could be changed by the customer via a web interface.
In that scenario, there was a max number of columns available from the web service, along with known attributes. But the customer could choose a subset, as well as reorder the columns.
I created a metadata table in the code via datalines (the metadata was a bit like proc contents output). I used approach #1 to get the list of column names from the header, as well as the order of the columns (a sort column). I then inner joined with the metadata to get the known column attributes and the correct order from the header itself, then generated the correct attrib statements and code to import the dynamic data.
HTH...
Something like this:
data class;
* reorder columns ;
format sex name weight height age;
set sashelp.class;
run;
* create metadata (pretend this was via datalines) ;
proc contents data=sashelp.class out=metadata (keep=name type length label) noprint;
run;
* create a CSV with a subset of the columns ;
data subset / view=subset;
set class;
keep name age sex;
run;
filename temp temp;
proc export data=subset file=temp dbms=csv replace;
run;
* get the structure of this csv file ;
data columns;
length name $32;
infile temp;
input;
do i=1 to countc(_infile_,',')+1;
name=scan(_infile_,i,',');
sort+1;
output;
end;
keep name sort;
stop;
run;
proc sql;
create table attribs as
select c.name, m.type, m.length, m.label
from metadata m
join columns c
on upcase(m.name)=upcase(c.name)
order by c.sort;
quit;
%macro attribs;
%let label=%trim(&label);
attrib &name
length=%if (&type eq 2) %then $&length; %else &length;
label="&label"
;
%mend;
data final;
%loop_control(control=attribs,mname=attribs);
infile temp dlm=',' dsd firstobs=2;
input (_all_) (:);
run;
https://github.com/scottbass/SAS/blob/master/Macro/loop_control.sas
@ScottBass wrote:
@ScottBass wrote:
I once had to do something similar, where I had to read a delimited file created by a web service. The structure could be changed by the customer via a web interface.
In that scenario, there was a max number of columns available from the web service, along with known attributes. But the customer could choose a subset, as well as reorder the columns.
I created a metadata table in the code via datalines (the metadata was a bit like proc contents output). I used approach #1 to get the list of column names from the header, as well as the order of the columns (a sort column). I then inner joined with the metadata to get the known column attributes and the correct order from the header itself, then generated the correct attrib statements and code to import the dynamic data.
HTH...
Something like this:
data class; * reorder columns ; format sex name weight height age; set sashelp.class; run; * create metadata (pretend this was via datalines) ; proc contents data=sashelp.class out=metadata (keep=name type length label) noprint; run; * create a CSV with a subset of the columns ; data subset / view=subset; set class; keep name age sex; run; filename temp temp; proc export data=subset file=temp dbms=csv replace; run; * get the structure of this csv file ; data columns; length name $32; infile temp; input; do i=1 to countc(_infile_,',')+1; name=scan(_infile_,i,','); sort+1; output; end; keep name sort; stop; run; proc sql; create table attribs as select c.name, m.type, m.length, m.label from metadata m join columns c on upcase(m.name)=upcase(c.name) order by c.sort; quit; %macro attribs; %let label=%trim(&label); attrib &name length=%if (&type eq 2) %then $&length; %else &length; label="&label" ; %mend; data final; %loop_control(control=attribs,mname=attribs); infile temp dlm=',' dsd firstobs=2; input (_all_) (:); run;
https://github.com/scottbass/SAS/blob/master/Macro/loop_control.sas
I will keep this in my bookmark for further usage. Very useful.
I just open some files dumped by the server and I realize that there's no headers. I take a look at the code (made by someone else) and I realize that field names are user-defined. Since the file do not have any headers, do you think I could use le length of the first line as an indicator of "columns"?
By example, the code for loading thoses files looks like :
data file1 ;
infile myfile.txt MISSOVER LRECL = 153 ;
input
@1 Field1 10.
@11 Field2 $15.
@26 Field3 $1.
@27 Field4 $1.
@28 Field5 $2.
@30 Field6 $6.
@36 Field7 $60.
@96 Field8 $2.
@98 Field9 9.
@107 Field10 13.
@120 Field11 11.
@131 Field12 12.
@143 Field13 $10.
@153 Field14 $1.
@154 Field15 $3.
;
run ;
Thanks for the help!
For that kind of structure you will probably want to check that the length of the lines is appropriate. For example your posted code is truncating the input lines at 153 bytes and then trying to read data up to byte number 156.
Also check key variables to make sure they have valid values. For example FIELD14 might only allow values of 'A' or 'B'. If something else appears in byte number 153 of the line then it is a good indication that the file is not right.
PS: Never use the antiquated MISSOVER option when using formatted input. If your input line had only 155 bytes your input of FIELD15 would be missing, even if bytes 154 and 155 where not blanks. Use the more robust TRUNCOVER option instead.
There is an INFILE option that will set a variable with the length of the line. You can also test the _INFILE_ automatic variable.
infile 'myfile.txt' length=line_length1 ;
input @;
line_length2=length(_infile_);
line_length3=lengthn(_infile_);
The difference between the two methods is the first one will count any trailing spaces on the line. The difference between the LENGTH() and LENGTHN() functions is when the line only contains blanks. In that case LENGTH() will return 1 and LENGTHN() will return 0.
@Tom wrote:
There is an INFILE option that will set a variable with the length of the line. You can also test the _INFILE_ automatic variable.
infile 'myfile.txt' length=line_length1 ; input @; line_length2=length(_infile_); line_length3=lengthn(_infile_);
The difference between the two methods is the first one will count any trailing spaces on the line. The difference between the LENGTH() and LENGTHN() functions is when the line only contains blanks. In that case LENGTH() will return 1 and LENGTHN() will return 0.
Hi @Tom ,
I tried to do this :
data t;
infile myfile.txt truncover;
input @;
run;
And it returns no observations, but the obs columns seems to count the total of "lines".
I tried your code, and it gives me all 155 in both columns.
However, I was wondering what the is the usage of this :
length=line_length1 ;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.