BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ChrisNZ
Tourmaline | Level 20

You should not be using 

%if %symexist(tcnt) %then %do;

as the macro should be %local, so always exists.

 

Did you not fix any of the other issues I mentioned either, and lay the bed for more headaches for the person coming after you?

imdickson
Quartz | Level 8

@ChrisNZ

i  Tried running the script and it gave me the list of data that has blank _N_=2.

 

However, when i run it to scan through 30k of excel csv, the programme will roughly takes 2 days based on the time needed for just 1k files that took about 2 hours.

 

This script clearly slows down the entire process.....do you have any idea on how to improve it?

 

 

I have no choice but to skip tcnt when it doesnt exist because i will never know if any of the 30k csv has blank record or not. The file is not generated by me.

Tom
Super User Tom
Super User

What does the program do?  Why does it need to use macro code to do it?  What SAS code is the macro generating?

 

What do you mean by "30K of excel csv"?  

Are there 30,000 different files?  Just reading the list of file names will take a long time.  Can you place them on a RAM disk and read from there?  Or are you talking about how big one of the files is?

 

Are they CSV files (which are simple text files) or Excel files (which is a binary format)?

 

imdickson
Quartz | Level 8

@Tom this is the script that i mentioned in another thread before but this time round is another issue.

This script will first read 30k of seperate csv files in multiple subfolders that will be scanned by recursive macro. 

Once it is read, it will then do some checking on the rows of excel first. For example, If 1st row then assign 1st column to a Variable.

If 2nd row, scan for any value that is not BLANK or TOTAL. However, after running the previous script for a few rounds, i noticed there are errors caused by some CSV that are not having any other value apart from BLANK or TOTAL in 2nd row. This is actually the type of CSV that i should be avoiding to load. Since I've assigned counter for 2nd row and if it has nothing(Apart from BLANK or TOTAL), the counter will not be initialized. I then use symexists to check if the counter for 2nd row(tcnt) exist or not. If it exists, proceed with all other formula calculation for each column and etc. Otherwise, i want to output the filename that tcnt doesnt exists into a table for a full list of bad csv file and i've also included "CONTINUE" at the end of the else statement to skip the current iteration.

 

However, when i make the script to scan for jsut 3 files, it will output the bad csv into a table called blankfilelist. When i make the script to scan for 1000 files which contains 1 or 2 bad csv file, it will never hit the condition of symexists else(which indicates tcnt doesnt exist).

DATA FINALTESTONE;					
STOP;					
length station $10 voltage $10 year 8 month $20 transformer $10					
		Day $20 Date Time MW_Imp MW_Exp MVAR_Imp MVAR_Exp MVA Power_Factor 8;			
	RUN;

data blankfilelist;
stop;
length blankfilename $50.;
run;
	
options nomprint nomlogic nosymbolgen;



%macro list_files(dir,ext);					
	%local filrf rc did memcnt name i;				
	%let rc=%sysfunc(filename(filrf,&dir));				
	%let did=%sysfunc(dopen(&filrf));				
					
	%if &did eq 0 %then				
		%do;			
			%put Directory &dir cannot be open or does not exist;		
					
			%return;		
		%end;			
					
	%do i = 1 %to %sysfunc(dnum(&did));				
		%let name=%qsysfunc(dread(&did,&i));			
					
		%if %qupcase(%qscan(&name,-1,.)) = %upcase(&ext) %then			
			%do;		
				%put &dir/&name;	
				%let file_name =  %qscan(&name,1,.);	
				%put &file_name;	
				/*Alfred added this*/	
			PROC IMPORT OUT=WORK.out/*&i%substr(&name,7,4)*/  DATAFILE= "&dir/&name" 		
		/*excelout*/			
            DBMS=csv REPLACE;		
			GUESSINGROWS=3000;			
			delimiter='09'x;		
			getnames=no;		
		RUN;			
		proc contents data=out/*&i&name*/ noprint out=data_info /*(keep = name varnum)*/;			
		run;			
					
		 data _null_;			
			set data_info;		
			call symputx(compress("col"||VARNUM),compress(NAME));		
			call symputx("cnt",_n_);		
		run;			
					
					
					
					
					
		/*Alfred test cutting macro into this portion*/			
					
		data _null_;			
		set WORK.out/*&i%substr(&name,7,4)*/ (obs=2);			
			if _n_ = 2 then do;		
			tcnt = 0;		
				%do j=1 %to &cnt;	
					if &&col&j not in ("","Total") /*and (&&col&j in ("T1") or &&col&j in ("T3"))*/ /*Alfred add on the right*/ then do;
					trxm = &&col&j;
					call symputx(compress("trxm"||tcnt),compress(trxm));
					call symputx("tcnt",tcnt);
					tcnt+1;
					end;

					
/*					end;*/
				%end;	
			end;		
		run;	

/*data checkcnt;*/
/*set work.out;*/
/*length blankfilename $50;	*/
	%if %symexist(tcnt) %then %do;
		%put tcnt ada;
	
		 			
		/*Alfred added code below*/					
					
		data test/*&i%substr(&name,7,4)*/ (drop=			
		%do k=1 %to &cnt;			
		&&col&k..			
		%end;			
		);			
					
		length station $10 voltage $10 year 8 month $20 transformer $10			
		Day $20 Date Time MW_Imp MW_Exp MVAR_Imp MVAR_Exp MVA Power_Factor 8;			
		format Time hhmm.;			
		set 			
		out/*&i%substr(&name,7,4)*/ /*out4 out5 out6 out7 out8*/ end=last;			
/*			%do z = 1 %to %sysfunc(dnum(&did));	 */	
/*				out&z. end=last;*/	
/*			%end;*/		
/*		by VAR1 VAR2 VAR3 VAR4 VAR5;	*/		
		retain station voltage year month;	
		/* Check if row 2 is empty or not. If empty then skip*/
/*	if &tcnt eq . then do;*/
/*	%put empty csv name is &name;*/
/*	data blankcsv;*/
/*	proc append base=blankcsv data=&*/
/*	end;*/
/* if &tcnt ne . then do;	*/
		if _n_ = 1 /*&tcnt ne 0*/ then do;			
		station = VAR1;			
		voltage = VAR2;			
		year = input(VAR5,4.);			
		month = VAR3;			
		end;			
		if last /*&tcnt ne 0*/ then do;			
		month = strip(put(intnx('month',input(catt(substr(month,1,3),'1960'),monyy.),1),monname10.));			
		if month = "January" then year = year+1;			
		end;			
		 			
		if _n_ > 4 /*&tcnt ne 0*/ then do;			
		Day = VAR1;			
		Date = VAR2;			
		Time = input(VAR3,time.);			
		%do m=0 %to &tcnt;			
		transformer = "&&trxm&m..";			
		MW_Imp = input(VAR%eval(4+%eval(&m*6)),best32.);			
		MW_Exp = input(VAR%eval(5+%eval(&m*6)),best32.);			
		MVAR_Imp = input(VAR%eval(6+%eval(&m*6)),best32.);			
		MVAR_Exp = input(VAR%eval(7+%eval(&m*6)),best32.);			
		MVA = input(VAR%eval(8+%eval(&m*6)),best32.);			
		if MVA < 0.001 then Power_Factor = 0;			
		else Power_Factor = max(MW_Imp,MW_Exp)/MVA;			
					
		output;			
		%end;			
		end;	
/*	end;	*/
		run;			
/*		Alfred added code below*/			
/*		%do y=2 %to %sysfunc(dnum(&did));*/			
					
/*		data finaltest;*/			
/*		set test;*/			
/*		run;*/			
		proc append base=finaltestone data=test FORCE;			
		run;			
				
/*		data finaltest;*/
/*		run;	*/
/*		proc sql;			*/
/*		select * from finaltest;			*/
/*		quit;			*/
/*		%end;	*/		
					
		/*End of alfred test*/			
					
					
					
					
					
					
					
					
					
					
					
					
					
					
/*				data _tmp;	*/
/*					length dir $512 name $100;*/
/*					dir=symget("dir");*/
/*					name=symget("name");*/
/*					path = catx('/',dir,name);*/
/*					the_name = substr(name,1,find(name,'.')-1);*/
/*				run;	*/
					
/*				proc append base=list data=_tmp force;	*/
/*				run;	*/
/*					*/
/*				quit;	*/
					
/*				proc sql;	*/
/*					drop table _tmp;*/
/*				quit;	*/
				%end;
	%else %do;
		%put tcnt hilang;
		data checkcnt;
/*set work.out;*/
			length blankfilename $50;	
			blankfilename = "&name";

/*SYMGET('file_name');*/
		run;
		proc append base=blankfilelist data=checkcnt force;
		run;
		%LET jump=1;
		%if jump =1 %then %do; CONTINUE;
		%end;
/*		output;*/
	%end;	
			%end;		
		%else %if %qscan(&name,2,.) = %then			
			%do;		
				%list_files(&dir/&name,&ext)	
			%end;		
	%end;				
					
	%let rc=%sysfunc(dclose(&did));				
	%let rc=%sysfunc(filename(filrf));				
%mend list_files;					
				
					

					
%list_files(/sasdata/source/tnbt/BIGTEST,xls);

 

 

Here i attached the sample code. 

 

Tom, can you tell me more about "place them on RAM disk" ?

Tom
Super User Tom
Super User

Your program is confused.

 

In the macro you are treating the files as CSV files, but in the macro call you are asking for it to look for XLS files.  Are the files CSV files or XLS files?  If they are XLS files then no wonder it takes PROC IMPORT a long time to try to figure out what they contain if you told it to treat them as CSV files.

 

If they are CSV files then why are you using PROC IMPORT to guess what is in the files?

Don't you know what is in the files?

Even if you don't why not just read everything as character and transform into numbers once you know which parts are numbers?

 

What is the purpose of the testing after the PROC IMPORT?  Are you trying to determine what line contains the variable names?  Does the location of  which row has the variable names change from file to file?  Wasn't that the problem you were having before when they were actual XLS files instead of CSV files?

 

If you want to read the beginning of a CSV and test for a column header named TOTAL then just read the beginning of the file using a data step.  No need to read the whole file to make that test, much less read the file twice like PROC IMPORT will have to do to determine the appropriate variable types for each column.

 

Perhaps you started this process with XLS or XLSX files and so could not program the reading yourself. But now that you have text files to read it would probably be easier to read them in a data step.  You can even read all of the CSV files in a directory in one data step if you want by using a wildcard in the filename specification.

 

Also are these file being generated new every time?  Or could you setup a process to only read in the new files and just keep the results for the old files as SAS datasets?  That should improve the overall performance even if each file is read slowly.

ChrisNZ
Tourmaline | Level 20

1. Performance

I have no idea what your program does.

Two hours seems excessive for processing 1000 files, but as usual, it depends (on the files, the processing, the hardware).

7 seconds per file seems ludicrous to me.

 

2. You don't seem to take my advice into account much.

If you don't strive to write clean programs, you just create problems for yourself, and you'll end up here asking questions more often than necessary.

imdickson
Quartz | Level 8

Hi @ChrisNZ

 

1. I explained what my script about at the reply above to Tom. You may have a look at it.

i agree that 7 secs is long for a csv to load and transform...

 

2. I appreciate and value all your advices Chris. I am also well aware of ditching the _N_=2 (obs=2) into the one you recommend. 

I am also aware of needs of keeping the data clean. Furthermore, I wish i would be able to eliminate all csv file that has missing value(bad csv) so that my programme wont run into problem. However, 

 

You also mentioned that symexists will always be 1 in macro. For that case, if i were to use macro to reset tcnt macro variable, would this be a great workaround for now?

 

I will definitely enhance the entire script after this has been delivered as I am way overdue for delivering. I definitely honor and value all your advice and suggestions Chris.

ChrisNZ
Tourmaline | Level 20

Please look in the log to see where the time is spent.

Is it the calls to proc import that take all the time? 

 

ChrisNZ
Tourmaline | Level 20

I cleaned up your program.

Avoid tabs to post here (and in general imho).

Pretty code makes for legible code. 

data FINALTESTONE;          
  stop;          
  length STATION $10 VOLTAGE $10 YEAR 8 MONTH $20 TRANSFORMER $10          
         DAY $20 DATE TIME MW_IMP MW_EXP MVAR_IMP MVAR_EXP MVA POWER_FACTOR 8;      
run;

data BLANKFILELIST;
  stop;
  length BLANKFILENAME $50.;
run;
  
options nomprint nomlogic nosymbolgen;


%macro list_files(dir,ext);          
  %local file_no col_no trxm_no name file_name filrf rc did name tcnt;        
  %let rc=%sysfunc(filename(filrf,&dir));        
  %let did=%sysfunc(dopen(&filrf));        
          
  %if &did eq 0 %then %do;      
    %put Directory &dir cannot be open or does not exist;    
    %return;    
  %end;      
          
  %do file_no = 1 %to %sysfunc(dnum(&did));        
    %let name=%qsysfunc(dread(&did,&file_no));      
    %if %qupcase(%qscan(&name,-1,.)) = %upcase(&ext) %then %do;    
      %let file_name =  %qscan(&name,1,.);  
      %put &=dir / &=name / &=file_name;  

      proc import out      = WORK.OUT/*&file_no%substr(&name,7,4)*/ 
                  datafile = "&dir/&name"     
                  dbms     = csv 
                  replace;    
        guessingrows=3000;      
        delimiter='09'x;    
        getnames=no;    
      run;      
      proc contents data=OUT/*&file_no&name*/ noprint out=DATA_INFO /*(keep = name varnum)*/;      
      run;      
      data _null_;      
        set DATA_INFO;    
        call symputx(compress("col"||VARNUM),compress(NAME));    
        call symputx("cnt",_N_);    
      run;      
           
      %let tcnt=0;
      data _null_;      
        set WORK.OUT/*&file_no%substr(&name,7,4)*/ (firstobs=2 obs=2);      
        %do col_no=1 %to &cnt;  
           if &&col&col_no not in ("","Total") /*and (&&col&col_no in ("T1") or &&col&col_no in ("T3"))*/ /*Alfred add on the right*/ then do;
             TRXM = &&col&col_no;
             call symputx(compress("trxm"||TCNT),compress(TRXM),'l');
             call symputx("tcnt",TCNT,'l');
             TCNT+1;
           end;
        %end;  
      run;  

      %if &tcnt %then %do;
        %put tcnt ada;
           
        data TEST/*&file_no%substr(&name,7,4)*/ (drop= %do col_no=1 %to &cnt; &&col&col_no %end; );
         
          length STATION $10 VOLTAGE $10 YEAR 8 MONTH $20 TRANSFORMER $10      
                 DAY $20 DATE TIME MW_IMP MW_EXP MVAR_IMP MVAR_EXP MVA POWER_FACTOR 8;      
          format TIME hhmm.;      
          set OUT/*&file_no%substr(&name,7,4)*/ /*out4 out5 out6 out7 out8*/ end=LAST;      
          retain STATION VOLTAGE YEAR MONTH;  
          if _n_ = 1 then do;      
            STATION = VAR1;      
            VOLTAGE = VAR2;      
            YEAR    = input(VAR5,4.);      
            MONTH   = VAR3;      
          end;      
          if LAST /*&tcnt ne 0*/ then do;      
            MONTH    = strip(put(intnx('month',input(catt(substr(MONTH,1,3),'1960'),monyy.),1),monname10.));      
            if MONTH = "January" then YEAR = YEAR+1;      
          end;      
           
          if _n_ > 4 /*&tcnt ne 0*/ then do;      
            Day  = VAR1;      
            Date = VAR2;      
            Time = input(VAR3,time.);      
            %do trxm_no=0 %to &tcnt;      
              TRANSFORMER  = "&&trxm&trxm_no..";      
              MW_Imp       = input(VAR%eval(4+&trxm_no*6),best32.);      
              MW_Exp       = input(VAR%eval(5+&trxm_no*6),best32.);      
              MVAR_Imp     = input(VAR%eval(6+&trxm_no*6),best32.);      
              MVAR_Exp     = input(VAR%eval(7+&trxm_no*6),best32.);      
              MVA          = input(VAR%eval(8+&trxm_no*6),best32.);      
              if MVA < 0.001 then POWER_FACTOR = 0;   
              else POWER_FACTOR = max(MW_Imp,MW_Exp) / MVA;   
              output;  
            %end; 
          end; 
        run;      
        proc append base=FINALTESTONE data=TEST force;      
        run;      
      %end;  %*   %if %symexist(tcnt) %then %do;
      %else %do;
        %put tcnt hilang;
        data CHECKCNT;
          length BLANKFILENAME $50;  
          BLANKFILENAME = "&name";
        run;
        proc append base=BLANKFILELIST data=CHECKCNT force; 
        run;


  /******** THIS BLOCK DOES NOTHING. REMOVE *************  */
        %let jump=1;
        %if jump =1 %then %do; 
           CONTINUE;
        %end;



      %end;  %*   ^%symexist(tcnt) ;
    %end;    %*   %if %qupcase(%qscan(&name,-1,.)) = %upcase(&ext) %then %do;     
    %else %if %qscan(&name,2,.) = %then %do;    
      %list_files(&dir/&name,&ext)  
    %end;    
  %end;        
          
  %let rc=%sysfunc(dclose(&did));        
  %let rc=%sysfunc(filename(filrf));  
 
%mend list_files;          
        
                    
%list_files(/sasdata/source/tnbt/BIGTEST,xls);

 

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 23 replies
  • 1734 views
  • 1 like
  • 3 in conversation