How to check if .csv has data or not.
My .csv has headers into it, so technically it's not empty.
I'm trying to import all .csv files from a directory through proc import, but some of my .csv files in directory do not have data, i.e. they have 1st header row but no data. (Blank file check doesn't work).
Now without data Proc Import is throwing error, so do we have any way to check if csv contains data.
Any leads around this would help.
Cordially,
Abhinav
Good catch on the missing input statement. i also found another error with regards to using variable "done".
This is of course a check for one specific file. Once it is verified that it works, then you/we can make it dynamic.
Like this:
%macro import_cond(path,infile);
data _null_;
infile "&path./&infile." end=done;
input;
if done or _N_ > 1
then do;
call symputx('number',_N_);
stop;
end;
run;
%if &number > 1
%then %do;
proc import
datafile="&path./&infile."
out=%scan(&infile,1,.)
dbms=csv
replace
;
run;
%end;
%mend;
data _null_;
input path :$30. filename :$50.;
call execute(cats('%nrstr(%import_cond(',path,',',filename,'))'));
datalines;
/folders/myfolders header.csv
/folders/myfolders header_with_data.csv
;
Tested with two datafiles on SAS UE, one with one line, the other with two lines. Only the second was imported:
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 72 73 %macro import_cond(path,infile); 74 data _null_; 75 infile "&path./&infile." end=done; 76 input; 77 if done or _N_ > 1 78 then do; 79 call symputx('number',_N_); 80 stop; 81 end; 82 run; 83 84 %if &number > 1 85 %then %do; 86 proc import 87 datafile="&path./&infile." 88 out=%scan(&infile,1,.) 89 dbms=csv 90 replace 91 ; 92 run; 93 %end; 94 %mend; 95 96 data _null_; 97 input path :$30. filename :$50.; 98 call execute(cats('%nrstr(%import_cond(',path,',',filename,'))')); 99 datalines; NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.00 seconds NOTE: CALL EXECUTE generated line. 102 ; 1 + %import_cond(/folders/myfolders,header.csv) NOTE: The infile "/folders/myfolders/header.csv" is: Dateiname=/folders/myfolders/header.csv, Besitzername=root,Gruppenname=vboxsf, Zugriffsberechtigung=-rwxrwx---, Zuletzt geändert=24. April 2020 13.11 Uhr, Dateigröße (Byte)=6 NOTE: 1 record was read from the infile "/folders/myfolders/header.csv". The minimum record length was 6. The maximum record length was 6. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.01 seconds 2 + %import_cond(/folders/myfolders,header_with_data.csv) NOTE: The infile "/folders/myfolders/header_with_data.csv" is: Dateiname=/folders/myfolders/header_with_data.csv, Besitzername=root,Gruppenname=vboxsf, Zugriffsberechtigung=-rwxrwx---, Zuletzt geändert=24. April 2020 13.11 Uhr, Dateigröße (Byte)=16 NOTE: 2 records were read from the infile "/folders/myfolders/header_with_data.csv". The minimum record length was 6. The maximum record length was 9. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.01 seconds cpu time 0.00 seconds NOTE: Unable to open parameter catalog: SASUSER.PARMS.PARMS.SLIST in update mode. Temporary parameter values will be saved to WORK.PARMS.PARMS.SLIST. 103 /********************************************************************** 104 * PRODUCT: SAS 105 * VERSION: 9.4 106 * CREATOR: External File Interface 107 * DATE: 24APR20 108 * DESC: Generated SAS Datastep Code 109 * TEMPLATE SOURCE: (None Specified.) 110 ***********************************************************************/ 111 data WORK.HEADER_WITH_DATA ; 112 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */ 113 infile '/folders/myfolders/header_with_data.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ; 114 informat header $9. ; 115 format header $9. ; 116 input 117 header $ 118 ; 119 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */ 120 run; NOTE: The infile '/folders/myfolders/header_with_data.csv' is: Dateiname=/folders/myfolders/header_with_data.csv, Besitzername=root,Gruppenname=vboxsf, Zugriffsberechtigung=-rwxrwx---, Zuletzt geändert=24. April 2020 13.11 Uhr, Dateigröße (Byte)=16 NOTE: 1 record was read from the infile '/folders/myfolders/header_with_data.csv'. The minimum record length was 9. The maximum record length was 9. NOTE: The data set WORK.HEADER_WITH_DATA has 1 observations and 1 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.01 seconds 1 rows created in WORK.HEADER_WITH_DATA from /folders/myfolders/header_with_data.csv. NOTE: WORK.HEADER_WITH_DATA data set was successfully created. NOTE: The data set WORK.HEADER_WITH_DATA has 1 observations and 1 variables. NOTE: Verwendet wurde: PROZEDUR IMPORT - (Gesamtverarbeitungszeit): real time 0.06 seconds cpu time 0.04 seconds 121 122 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 134
Do you have XCMD enabled, meaning that you can use the X statement and other tools for external programs?
If yes, does your SAS run on Windows or UNIX/Linux?
Please post the code you use, so we can see where to insert the check. Use the "little running man" to post the code.
No Kurt,
I'm into a Client Private SAS cloud with very less privileges.
Pipe and XCMD doen't work in my case.
Cordially,
Abhinav
You can check in a preliminary data step:
data _null_;
infile "....." end=done;
if end or _N_ > 1
then do;
call symputx('number',_N_);
stop;
end;
run;
%if &number > 1
%then %do;
/* import code here */
%end;
This will not work until we mention input statement.
And if we add input statement, this would hardcode the check for specific csv file which in-turn will defeat the purpose of a dynamic check.
If you run the code on a file with data, it will still show 0 records were read from the file.
FYI, I've good understanding of Base/Advanced SAS.
Cordially,
Abhinav
Good catch on the missing input statement. i also found another error with regards to using variable "done".
This is of course a check for one specific file. Once it is verified that it works, then you/we can make it dynamic.
Like this:
%macro import_cond(path,infile);
data _null_;
infile "&path./&infile." end=done;
input;
if done or _N_ > 1
then do;
call symputx('number',_N_);
stop;
end;
run;
%if &number > 1
%then %do;
proc import
datafile="&path./&infile."
out=%scan(&infile,1,.)
dbms=csv
replace
;
run;
%end;
%mend;
data _null_;
input path :$30. filename :$50.;
call execute(cats('%nrstr(%import_cond(',path,',',filename,'))'));
datalines;
/folders/myfolders header.csv
/folders/myfolders header_with_data.csv
;
Tested with two datafiles on SAS UE, one with one line, the other with two lines. Only the second was imported:
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 72 73 %macro import_cond(path,infile); 74 data _null_; 75 infile "&path./&infile." end=done; 76 input; 77 if done or _N_ > 1 78 then do; 79 call symputx('number',_N_); 80 stop; 81 end; 82 run; 83 84 %if &number > 1 85 %then %do; 86 proc import 87 datafile="&path./&infile." 88 out=%scan(&infile,1,.) 89 dbms=csv 90 replace 91 ; 92 run; 93 %end; 94 %mend; 95 96 data _null_; 97 input path :$30. filename :$50.; 98 call execute(cats('%nrstr(%import_cond(',path,',',filename,'))')); 99 datalines; NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.00 seconds NOTE: CALL EXECUTE generated line. 102 ; 1 + %import_cond(/folders/myfolders,header.csv) NOTE: The infile "/folders/myfolders/header.csv" is: Dateiname=/folders/myfolders/header.csv, Besitzername=root,Gruppenname=vboxsf, Zugriffsberechtigung=-rwxrwx---, Zuletzt geändert=24. April 2020 13.11 Uhr, Dateigröße (Byte)=6 NOTE: 1 record was read from the infile "/folders/myfolders/header.csv". The minimum record length was 6. The maximum record length was 6. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.01 seconds 2 + %import_cond(/folders/myfolders,header_with_data.csv) NOTE: The infile "/folders/myfolders/header_with_data.csv" is: Dateiname=/folders/myfolders/header_with_data.csv, Besitzername=root,Gruppenname=vboxsf, Zugriffsberechtigung=-rwxrwx---, Zuletzt geändert=24. April 2020 13.11 Uhr, Dateigröße (Byte)=16 NOTE: 2 records were read from the infile "/folders/myfolders/header_with_data.csv". The minimum record length was 6. The maximum record length was 9. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.01 seconds cpu time 0.00 seconds NOTE: Unable to open parameter catalog: SASUSER.PARMS.PARMS.SLIST in update mode. Temporary parameter values will be saved to WORK.PARMS.PARMS.SLIST. 103 /********************************************************************** 104 * PRODUCT: SAS 105 * VERSION: 9.4 106 * CREATOR: External File Interface 107 * DATE: 24APR20 108 * DESC: Generated SAS Datastep Code 109 * TEMPLATE SOURCE: (None Specified.) 110 ***********************************************************************/ 111 data WORK.HEADER_WITH_DATA ; 112 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */ 113 infile '/folders/myfolders/header_with_data.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ; 114 informat header $9. ; 115 format header $9. ; 116 input 117 header $ 118 ; 119 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */ 120 run; NOTE: The infile '/folders/myfolders/header_with_data.csv' is: Dateiname=/folders/myfolders/header_with_data.csv, Besitzername=root,Gruppenname=vboxsf, Zugriffsberechtigung=-rwxrwx---, Zuletzt geändert=24. April 2020 13.11 Uhr, Dateigröße (Byte)=16 NOTE: 1 record was read from the infile '/folders/myfolders/header_with_data.csv'. The minimum record length was 9. The maximum record length was 9. NOTE: The data set WORK.HEADER_WITH_DATA has 1 observations and 1 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.01 seconds 1 rows created in WORK.HEADER_WITH_DATA from /folders/myfolders/header_with_data.csv. NOTE: WORK.HEADER_WITH_DATA data set was successfully created. NOTE: The data set WORK.HEADER_WITH_DATA has 1 observations and 1 variables. NOTE: Verwendet wurde: PROZEDUR IMPORT - (Gesamtverarbeitungszeit): real time 0.06 seconds cpu time 0.04 seconds 121 122 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 134
I agree Kurt.
This can work but I'm writing a production code to move all csv files in a directory after converting them to .sas7bdat and having this kind of check would actually take a lot of CPU usage bandwidth as it would resolve all the variables into input statement.
I instead got a .txt file from client in same location with all the blank .csv files names mentioned inside it.This way we can save a lot if CPU usage since there would be more than 400 files with variables ranging to hundreds.
This way I can read the file and check to import only those which are not mentioned in the .txt file.
Thank you Kurt for your time.
Cordially,
Abhinav
@iamAbhinav wrote:
having this kind of check would actually take a lot of CPU usage bandwidth as it would resolve all the variables into input statement.
This is not true AT ALL.
My code reads one line WITHOUT trying to resolve any variables, and immediately stops after reading the second line (if such is present).
Only when it the detects more than one line will it try a proc import.
You should try the code, and take care when reading the log I posted.
The general solution to writing a data step that can handle empty files (texts or datasets) is to test the END= variable BEFORE trying to read the first line/observation.
data _null_;
if eof then call symputx('nlines',_n_-1);
infile csv end=eof;
input;
run;
If you only need to know if the answer is zero, one or two or more then you could add the OBS=2 option to the INFILE statement.
data _null_;
if eof then call symputx('nlines',_n_-1);
infile csv end=eof obs=2;
input;
run;
Or to know if the value is 0,1,2,..N-1 or N or more then add OBS=N for any integer value of N you want.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.