BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
iamAbhinav
Fluorite | Level 6

How to check if .csv has data or not. 

My .csv has headers into it, so technically it's not empty. 

I'm trying to import all .csv files from a directory through proc import, but some of my .csv files in directory do not have data, i.e. they have 1st header row but no data. (Blank file check doesn't work).

Now without data Proc Import is throwing error, so do we have any way to check if csv contains data.

 

Any leads around this would help.

 

Cordially,

Abhinav

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Good catch on the missing input statement. i also found another error with regards to using variable "done".

This is of course a check for one specific file. Once it is verified that it works, then you/we can make it dynamic.

Like this:

%macro import_cond(path,infile);
data _null_;
infile "&path./&infile." end=done;
input;
if done or _N_ > 1
then do;
  call symputx('number',_N_);
  stop;
end;
run;

%if &number > 1
%then %do;
proc import
  datafile="&path./&infile."
  out=%scan(&infile,1,.)
  dbms=csv
  replace
;
run;
%end;
%mend;

data _null_;
input path :$30. filename :$50.;
call execute(cats('%nrstr(%import_cond(',path,',',filename,'))'));
datalines;
/folders/myfolders header.csv
/folders/myfolders header_with_data.csv
;

Tested with two datafiles on SAS UE, one with one line, the other with two lines. Only the second was imported:

 
 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 72         
 73         %macro import_cond(path,infile);
 74         data _null_;
 75         infile "&path./&infile." end=done;
 76         input;
 77         if done or _N_ > 1
 78         then do;
 79           call symputx('number',_N_);
 80           stop;
 81         end;
 82         run;
 83         
 84         %if &number > 1
 85         %then %do;
 86         proc import
 87           datafile="&path./&infile."
 88           out=%scan(&infile,1,.)
 89           dbms=csv
 90           replace
 91         ;
 92         run;
 93         %end;
 94         %mend;
 95         
 96         data _null_;
 97         input path :$30. filename :$50.;
 98         call execute(cats('%nrstr(%import_cond(',path,',',filename,'))'));
 99         datalines;
 
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       cpu time            0.00 seconds
       
 
 NOTE: CALL EXECUTE generated line.
 102        ;
 1         + %import_cond(/folders/myfolders,header.csv)
 
 NOTE: The infile "/folders/myfolders/header.csv" is:
       Dateiname=/folders/myfolders/header.csv,
       Besitzername=root,Gruppenname=vboxsf,
       Zugriffsberechtigung=-rwxrwx---,
       Zuletzt geändert=24. April 2020 13.11 Uhr,
       Dateigröße (Byte)=6
 
 NOTE: 1 record was read from the infile "/folders/myfolders/header.csv".
       The minimum record length was 6.
       The maximum record length was 6.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       cpu time            0.01 seconds
       
 
 2         + %import_cond(/folders/myfolders,header_with_data.csv)
 
 NOTE: The infile "/folders/myfolders/header_with_data.csv" is:
       Dateiname=/folders/myfolders/header_with_data.csv,
       Besitzername=root,Gruppenname=vboxsf,
       Zugriffsberechtigung=-rwxrwx---,
       Zuletzt geändert=24. April 2020 13.11 Uhr,
       Dateigröße (Byte)=16
 
 NOTE: 2 records were read from the infile "/folders/myfolders/header_with_data.csv".
       The minimum record length was 6.
       The maximum record length was 9.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.01 seconds
       cpu time            0.00 seconds
       
 
 
 NOTE: Unable to open parameter catalog: SASUSER.PARMS.PARMS.SLIST in update mode. Temporary parameter values will be saved to 
 WORK.PARMS.PARMS.SLIST.
 103         /**********************************************************************
 104         *   PRODUCT:   SAS
 105         *   VERSION:   9.4
 106         *   CREATOR:   External File Interface
 107         *   DATE:      24APR20
 108         *   DESC:      Generated SAS Datastep Code
 109         *   TEMPLATE SOURCE:  (None Specified.)
 110         ***********************************************************************/
 111            data WORK.HEADER_WITH_DATA    ;
 112            %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
 113            infile '/folders/myfolders/header_with_data.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
 114               informat header $9. ;
 115               format header $9. ;
 116            input
 117                        header  $
 118            ;
 119            if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
 120            run;
 
 NOTE: The infile '/folders/myfolders/header_with_data.csv' is:
       Dateiname=/folders/myfolders/header_with_data.csv,
       Besitzername=root,Gruppenname=vboxsf,
       Zugriffsberechtigung=-rwxrwx---,
       Zuletzt geändert=24. April 2020 13.11 Uhr,
       Dateigröße (Byte)=16
 
 NOTE: 1 record was read from the infile '/folders/myfolders/header_with_data.csv'.
       The minimum record length was 9.
       The maximum record length was 9.
 NOTE: The data set WORK.HEADER_WITH_DATA has 1 observations and 1 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       cpu time            0.01 seconds
       
 
 1 rows created in WORK.HEADER_WITH_DATA from /folders/myfolders/header_with_data.csv.
   
   
   
 NOTE: WORK.HEADER_WITH_DATA data set was successfully created.
 NOTE: The data set WORK.HEADER_WITH_DATA has 1 observations and 1 variables.
 NOTE:  Verwendet wurde: PROZEDUR IMPORT - (Gesamtverarbeitungszeit):
       real time           0.06 seconds
       cpu time            0.04 seconds
       
 
 121        
 122        OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 134        

View solution in original post

10 REPLIES 10
Kurt_Bremser
Super User

Do you have XCMD enabled, meaning that you can use the X statement and other tools for external programs?

If yes, does your SAS run on Windows or UNIX/Linux?

Please post the code you use, so we can see where to insert the check. Use the "little running man" to post the code.

iamAbhinav
Fluorite | Level 6

No Kurt,


I'm into a Client Private SAS cloud with very less privileges. 
Pipe and XCMD doen't work in my case.

 

Cordially,

Abhinav

Kurt_Bremser
Super User

You can check in a preliminary data step:

data _null_;
infile "....." end=done;
if end or _N_ > 1
then do;
  call symputx('number',_N_);
  stop;
end;
run;

%if &number > 1
%then %do;
/* import code here */
%end;
iamAbhinav
Fluorite | Level 6

This will not work until we mention input statement. 
And if we add input statement, this would hardcode the check for specific csv file which in-turn will defeat the purpose of a dynamic check.

If you run the code on a file with data, it will still show 0 records were read from the file.

 

FYI, I've good understanding of Base/Advanced SAS. 

 

Cordially,
Abhinav

 

Kurt_Bremser
Super User

Good catch on the missing input statement. i also found another error with regards to using variable "done".

This is of course a check for one specific file. Once it is verified that it works, then you/we can make it dynamic.

Like this:

%macro import_cond(path,infile);
data _null_;
infile "&path./&infile." end=done;
input;
if done or _N_ > 1
then do;
  call symputx('number',_N_);
  stop;
end;
run;

%if &number > 1
%then %do;
proc import
  datafile="&path./&infile."
  out=%scan(&infile,1,.)
  dbms=csv
  replace
;
run;
%end;
%mend;

data _null_;
input path :$30. filename :$50.;
call execute(cats('%nrstr(%import_cond(',path,',',filename,'))'));
datalines;
/folders/myfolders header.csv
/folders/myfolders header_with_data.csv
;

Tested with two datafiles on SAS UE, one with one line, the other with two lines. Only the second was imported:

 
 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 72         
 73         %macro import_cond(path,infile);
 74         data _null_;
 75         infile "&path./&infile." end=done;
 76         input;
 77         if done or _N_ > 1
 78         then do;
 79           call symputx('number',_N_);
 80           stop;
 81         end;
 82         run;
 83         
 84         %if &number > 1
 85         %then %do;
 86         proc import
 87           datafile="&path./&infile."
 88           out=%scan(&infile,1,.)
 89           dbms=csv
 90           replace
 91         ;
 92         run;
 93         %end;
 94         %mend;
 95         
 96         data _null_;
 97         input path :$30. filename :$50.;
 98         call execute(cats('%nrstr(%import_cond(',path,',',filename,'))'));
 99         datalines;
 
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       cpu time            0.00 seconds
       
 
 NOTE: CALL EXECUTE generated line.
 102        ;
 1         + %import_cond(/folders/myfolders,header.csv)
 
 NOTE: The infile "/folders/myfolders/header.csv" is:
       Dateiname=/folders/myfolders/header.csv,
       Besitzername=root,Gruppenname=vboxsf,
       Zugriffsberechtigung=-rwxrwx---,
       Zuletzt geändert=24. April 2020 13.11 Uhr,
       Dateigröße (Byte)=6
 
 NOTE: 1 record was read from the infile "/folders/myfolders/header.csv".
       The minimum record length was 6.
       The maximum record length was 6.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       cpu time            0.01 seconds
       
 
 2         + %import_cond(/folders/myfolders,header_with_data.csv)
 
 NOTE: The infile "/folders/myfolders/header_with_data.csv" is:
       Dateiname=/folders/myfolders/header_with_data.csv,
       Besitzername=root,Gruppenname=vboxsf,
       Zugriffsberechtigung=-rwxrwx---,
       Zuletzt geändert=24. April 2020 13.11 Uhr,
       Dateigröße (Byte)=16
 
 NOTE: 2 records were read from the infile "/folders/myfolders/header_with_data.csv".
       The minimum record length was 6.
       The maximum record length was 9.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.01 seconds
       cpu time            0.00 seconds
       
 
 
 NOTE: Unable to open parameter catalog: SASUSER.PARMS.PARMS.SLIST in update mode. Temporary parameter values will be saved to 
 WORK.PARMS.PARMS.SLIST.
 103         /**********************************************************************
 104         *   PRODUCT:   SAS
 105         *   VERSION:   9.4
 106         *   CREATOR:   External File Interface
 107         *   DATE:      24APR20
 108         *   DESC:      Generated SAS Datastep Code
 109         *   TEMPLATE SOURCE:  (None Specified.)
 110         ***********************************************************************/
 111            data WORK.HEADER_WITH_DATA    ;
 112            %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
 113            infile '/folders/myfolders/header_with_data.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
 114               informat header $9. ;
 115               format header $9. ;
 116            input
 117                        header  $
 118            ;
 119            if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
 120            run;
 
 NOTE: The infile '/folders/myfolders/header_with_data.csv' is:
       Dateiname=/folders/myfolders/header_with_data.csv,
       Besitzername=root,Gruppenname=vboxsf,
       Zugriffsberechtigung=-rwxrwx---,
       Zuletzt geändert=24. April 2020 13.11 Uhr,
       Dateigröße (Byte)=16
 
 NOTE: 1 record was read from the infile '/folders/myfolders/header_with_data.csv'.
       The minimum record length was 9.
       The maximum record length was 9.
 NOTE: The data set WORK.HEADER_WITH_DATA has 1 observations and 1 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       cpu time            0.01 seconds
       
 
 1 rows created in WORK.HEADER_WITH_DATA from /folders/myfolders/header_with_data.csv.
   
   
   
 NOTE: WORK.HEADER_WITH_DATA data set was successfully created.
 NOTE: The data set WORK.HEADER_WITH_DATA has 1 observations and 1 variables.
 NOTE:  Verwendet wurde: PROZEDUR IMPORT - (Gesamtverarbeitungszeit):
       real time           0.06 seconds
       cpu time            0.04 seconds
       
 
 121        
 122        OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 134        
iamAbhinav
Fluorite | Level 6

I agree Kurt.

This can work but I'm writing a production code to move all csv files in a directory after converting them to .sas7bdat and having this kind of check would actually take a lot of CPU usage bandwidth as it would resolve all the variables into input statement. 


I instead got a .txt file from client in same location with all the blank .csv files names mentioned inside it.This way we can save a lot if CPU usage since there would be more than 400 files with variables ranging to hundreds.

 

This way I can read the file and check to import only those which are not mentioned in the .txt file.

 

Thank you Kurt for your time. 

 

Cordially,

Abhinav

 

Kurt_Bremser
Super User

@iamAbhinav wrote:

 

having this kind of check would actually take a lot of CPU usage bandwidth as it would resolve all the variables into input statement. 

 


This is not true AT ALL.

My code reads one line WITHOUT trying to resolve any variables, and immediately stops after reading the second line (if such is present).

Only when it the detects more than one line will it try a proc import.

You should try the code, and take care when reading the log I posted.

Tom
Super User Tom
Super User

The general solution to writing a data step that can handle empty files (texts or datasets) is to test the END= variable BEFORE trying to read the first line/observation.

data _null_;
  if eof then call symputx('nlines',_n_-1);
  infile csv end=eof;
  input;
run;

If you only need to know if the answer is zero, one or two or more then you could add the OBS=2 option to the INFILE statement. 

data _null_;
  if eof then call symputx('nlines',_n_-1);
  infile csv end=eof obs=2;
  input;
run;

Or to know if the value is 0,1,2,..N-1 or N or more then add OBS=N for any integer value of N you want.

 

Ksharp
Super User
proc import ........;
getnames=no;
run;

This would not throw an error even csv file is empty.
Check this imported dataset , if it only contains one obs then the csv file is empty .
iamAbhinav
Fluorite | Level 6
But many of my files have Data and that would hamper my import of data.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 3024 views
  • 2 likes
  • 4 in conversation