BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JacquesR
Quartz | Level 8

Hi all

 

I have a folder with ±250 Excel files. The exact nature of the files (e.g., the worksheets contained) differs.

I need to import and combine data from all of these workbooks.

 

I started with this, which was quite helpful:

http://support.sas.com/kb/41/880.html

 

What I have so far is (I have shortened paths with the elipsis to declutter, but the full paths appear in the Filename and %Let fileloc statements):

 

 

    options mprint mlogic symbolgen; 

    %MACRO StackFBYT;

        PROC DATASETS; DELETE dirlist;RUN;

        FILENAME _FList PIPE 'dir "C:\Users\...\*.XLSX" /b ';
        %LET _FileLoc = "C:\Users\...\";
        
        DATA dirlist ;
           INFILE _FList LRECL=256 TRUNCOVER;
           INPUT file_name $256.;
        RUN;

        DATA _null_;
           SET dirlist END=end;
           count+1;
           CALL SYMPUTX('read'||PUT(count,4.-l),CATS('C:\Users\...\',file_name));
           IF END THEN CALL SYMPUTX('_FileCount',count);
        run;
        PROC PRINT DATA=dirlist;RUN;
        
        %DO dfCount=1 %TO &_FileCount ;
            DATA _null_;
               CALL SYMPUT ('CurFile',SYMGET(CATS('read',&dfCount)));
	    RUN;

            %IF FINDC(&CurFile,"AU") %THEN %DO;
                %LET NSheets=5;
                %LET _Region="AU";
            %END;    
            %ELSE %IF FINDC(&CurFile,"US") %THEN %DO;
                %LET NSheets=2;
                %LET _Region="US";
            %END;    
            %ELSE %IF FINDC(&CurFile,"WW") %THEN %DO;
                %LET NSheets=2;
                %LET _Region="WW";
            %END;    

            %IF FINDC(&CurFile,"Daily") %THEN %DO;
                %LET _Resolution=daily;
            %END;    
            %ELSE %IF FINDC(&CurFile,"Monthly") %THEN %DO;
                %LET _Resolution=monthly;
            %END;    
            %ELSE %IF FINDC(&CurFile,"Yearly") %THEN %DO;
                %LET _Resolution=yearly;
            %END;    
            
            %DO wsCount=1 %TO &NSheets;
                %IF &wsCount %EQ 1 %THEN %DO;
                    PROC IMPORT DATAFILE=&CurFile OUT=D_&dfCount._&wsCount DBMS=XLSX REPLACE; SHEET="Samplings-facebook";RUN;
                    %LET _term=facebook;
                %END;
etc....

Essentially, I want to scan the file name, and if certain strings are found, store values in relevant macro variables which can then be used to differentially process the importing of the files.

 

When I get to the line

%IF FINDC(&CurFile,"AU") %THEN %DO;

I get this error:

SYMBOLGEN: Macro variable CURFILE resolves to C:\Users\...\(facebook)(youtube)(weather)(cricket)(superannuation);AU;2004-01-01--2005-02-03(Daily);356 samples;2019-02-13.xlsx
ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric
operand is required. The condition was: FINDC(&CurFile,"AU")
 

I have tried this:

%LET Test1=%EVAL(FINDC(%NRBQUOTE(&CurFile),"AU");
%PUT &Test1;

which gives me:

ERROR: Required operator not found in expression:
FINDC(C:\Users\...\(facebook)(youtube)(weather)(cricket)(superannuation);AU;2004-01-01--2005-02-03(Daily);356 samples;2019-02-13.xlsx,"AU")

 

Something seems to go awry when I feed in the full file name to FINDC, and I cannot seem to figure out what it is.

When I manually enter the path (in quotes), it seems as if the editor is not reading it as a quoted string, but I'm not certain that that has anything to do with it. 

 

Any help would be appreciated.

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
ScottBass
Rhodochrosite | Level 12

It's hard for me to tell exactly what you're wanting to do, so these are general comments:

 

1)  Perhaps my %dirlist macro could help?  

https://github.com/scottbass/SAS/blob/master/Macro/dirlist.sas. 

Also download %parmv, or else comment out the calls to %parmv in %dirlist.

 

2) I think you're overusing macro code.  I would build all your metadata in a data step/data set rather than macro code.

 

3) Once you create your metadata, perhaps my %loop_control macro could help?  

https://github.com/scottbass/SAS/blob/master/Macro/loop_control.sas.

 

4) Alternatively, you could use a data _null_ step to generate your PROC IMPORT code, write that code to a file, then %include that file.

 

5) Do you have SAS/Access to PC File Formats licensed?  In particular, does this work:

 

libname xcel excel "C:\Your\Excel\File.xlsx";

If it fails, you can't use that approach.  If it does work, you could spin through your Excel files, and use PROC COPY to copy all the worksheets to SAS datasets.

 

6) I have another macro, which may or may not be useful, depending on #5.  

https://github.com/scottbass/SAS/blob/master/Macro/excel2sas.sas.

 

Hope at least some of this helps...


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

View solution in original post

14 REPLIES 14
ChrisNZ
Tourmaline | Level 20

1. Try 

 %if %sysfunc(findc(&CurFile,Yearly))

 - Using a SAS function with the macro language requires encapsulating in macro function %sysfunc()
 - No quotes needed as the macro language only deals with strings

 

2. I think you want to use function find() rather than findc(). Please know the functions you use.,
   Even better would be to use macro function %index() instead.

JacquesR
Quartz | Level 8

Hi Chris

 

Thanks for the suggestions. I see my confusion about FINDC.

I am still not getting any joy, though.

Look at some of the results from the data step:

        DATA _null_;
           SET dirlist END=end;
           count+1;
           CALL SYMPUTX('read'||PUT(count,4.-l),CATS(&_FileLoc,file_name));
           %IF %SYSFUNC(FIND(file_name,"AU")) %THEN %DO;
              CALL SYMPUTX('_Region'||PUT(count,4.-l),"AU");                 
              CALL SYMPUTX('_NSheets'||PUT(count,4.-l),5);                 
           %END;
           %ELSE %IF %SYSFUNC(FIND(file_name,"US")) %THEN %DO;
              CALL SYMPUTX('_Region'||PUT(count,4.-l),"US");
              CALL SYMPUTX('_NSheets'||PUT(count,4.-l),2);                 
           %END;
           %ELSE %IF %SYSFUNC(FIND(file_name,"WW")) %THEN %DO;
              CALL SYMPUTX('_Region'||PUT(count,4.-l),"WW");
              CALL SYMPUTX('_NSheets'||PUT(count,4.-l),2);                 
           %END;

            %IF %SYSFUNC(FIND(file_name,"Daily")) %THEN 
                CALL SYMPUTX('_Resolution'||PUT(count,4.-l),"daily");
            %ELSE %IF %SYSFUNC(FIND(file_name,"Monthly")) %THEN 
                CALL SYMPUTX('_Resolution'||PUT(count,4.-l),"monthly");
            %ELSE %IF %SYSFUNC(FIND(file_name,"Yearly")) %THEN 
                CALL SYMPUTX('_Resolution'||PUT(count,4.-l),"yearly");

           IF END THEN CALL SYMPUTX('_FileCount',count);
        run;

MPRINT(STACKFBYT): DATA _null_;
MPRINT(STACKFBYT): SET dirlist END=end;
MPRINT(STACKFBYT): count+1;
SYMBOLGEN: Macro variable _FILELOC resolves to "p:\...\Analysis set\"
MPRINT(STACKFBYT): CALL SYMPUTX('read'||PUT(count,4.-l),CATS("p:\...\Analysis set\",file_name));
MLOGIC(STACKFBYT): %IF condition %SYSFUNC(FIND(file_name,"AU")) is FALSE
MLOGIC(STACKFBYT): %IF condition %SYSFUNC(FIND(file_name,"US")) is FALSE
MLOGIC(STACKFBYT): %IF condition %SYSFUNC(FIND(file_name,"WW")) is FALSE
MLOGIC(STACKFBYT): %IF condition %SYSFUNC(FIND(file_name,"Daily")) is FALSE
MLOGIC(STACKFBYT): %IF condition %SYSFUNC(FIND(file_name,"Monthly")) is FALSE
MLOGIC(STACKFBYT): %IF condition %SYSFUNC(FIND(file_name,"Yearly")) is FALSE

and then this:

        DATA _null_;
           SET dirlist END=end;
           count+1;
           CALL SYMPUTX('read'||PUT(count,4.-l),CATS(&_FileLoc,file_name));
           %IF %SYSFUNC(INDEX(file_name,"AU")) %THEN %DO;
              CALL SYMPUTX('_Region'||PUT(count,4.-l),"AU");                 
              CALL SYMPUTX('_NSheets'||PUT(count,4.-l),5);                 
           %END;
           %ELSE %IF %SYSFUNC(INDEX(file_name,"US")) %THEN %DO;
              CALL SYMPUTX('_Region'||PUT(count,4.-l),"US");
              CALL SYMPUTX('_NSheets'||PUT(count,4.-l),2);                 
           %END;
           %ELSE %IF %SYSFUNC(INDEX(file_name,"WW")) %THEN %DO;
              CALL SYMPUTX('_Region'||PUT(count,4.-l),"WW");
              CALL SYMPUTX('_NSheets'||PUT(count,4.-l),2);                 
           %END;

            %IF %SYSFUNC(INDEX(file_name,"Daily")) %THEN 
                CALL SYMPUTX('_Resolution'||PUT(count,4.-l),"daily");
            %ELSE %IF %SYSFUNC(INDEX(file_name,"Monthly")) %THEN 
                CALL SYMPUTX('_Resolution'||PUT(count,4.-l),"monthly");
            %ELSE %IF %SYSFUNC(INDEX(file_name,"Yearly")) %THEN 
                CALL SYMPUTX('_Resolution'||PUT(count,4.-l),"yearly");

           IF END THEN CALL SYMPUTX('_FileCount',count);
        run;

MPRINT(STACKFBYT): DATA _null_;
MPRINT(STACKFBYT): SET dirlist END=end;
MPRINT(STACKFBYT): count+1;
SYMBOLGEN: Macro variable _FILELOC resolves to "p:\...\Analysis set\"
MPRINT(STACKFBYT): CALL SYMPUTX('read'||PUT(count,4.-l),CATS("p:\...\Analysis set\",file_name));
MLOGIC(STACKFBYT): %IF condition %SYSFUNC(INDEX(file_name,"AU")) is FALSE
MLOGIC(STACKFBYT): %IF condition %SYSFUNC(INDEX(file_name,"US")) is FALSE
MLOGIC(STACKFBYT): %IF condition %SYSFUNC(INDEX(file_name,"WW")) is FALSE
MLOGIC(STACKFBYT): %IF condition %SYSFUNC(INDEX(file_name,"Daily")) is FALSE
MLOGIC(STACKFBYT): %IF condition %SYSFUNC(INDEX(file_name,"Monthly")) is FALSE
MLOGIC(STACKFBYT): %IF condition %SYSFUNC(INDEX(file_name,"Yearly")) is FALSE

For the first three files, these are the filenames:

put file_name;

(facebook)(youtube);US;2004-01-01--2018-12-31(Yearly);15 samples;2019-01-27.xlsx
(facebook)(youtube);WW;2004-01-01--2018-12-31(Yearly);15 samples;2019-02-14.xlsx
(facebook)(youtube)(weather)(cricket)(superannuation);AU;2004-01-01--2018-12-31(Yearly);15 samples
;2019-01-22.xlsx

 

Why is it evaluating FALSE?

 

Oh, b.t.w., I did try INDEX and FIND with and without the quotes on the find_string portions (e.g., AU/US/WW), and the result is the same.

 

Any help would be appreciated.

Tom
Super User Tom
Super User

MLOGIC(STACKFBYT): %IF condition %SYSFUNC(FIND(file_name,"AU")) is FALSE

 

Do your file names really have quotes in them?

That test is looking for a string that is FOUR characters long.  The two letters and the two double quotes.

 

Also you are searching the constant text 

file_name

 

You appear to be confused about what macro code does.  You use macro code to generate code.

 

Follow @ScottBass 's advice and get your list of files into a dataset and use a data step to write the code you want to a text file. 

 

ScottBass
Rhodochrosite | Level 12

It's hard for me to tell exactly what you're wanting to do, so these are general comments:

 

1)  Perhaps my %dirlist macro could help?  

https://github.com/scottbass/SAS/blob/master/Macro/dirlist.sas. 

Also download %parmv, or else comment out the calls to %parmv in %dirlist.

 

2) I think you're overusing macro code.  I would build all your metadata in a data step/data set rather than macro code.

 

3) Once you create your metadata, perhaps my %loop_control macro could help?  

https://github.com/scottbass/SAS/blob/master/Macro/loop_control.sas.

 

4) Alternatively, you could use a data _null_ step to generate your PROC IMPORT code, write that code to a file, then %include that file.

 

5) Do you have SAS/Access to PC File Formats licensed?  In particular, does this work:

 

libname xcel excel "C:\Your\Excel\File.xlsx";

If it fails, you can't use that approach.  If it does work, you could spin through your Excel files, and use PROC COPY to copy all the worksheets to SAS datasets.

 

6) I have another macro, which may or may not be useful, depending on #5.  

https://github.com/scottbass/SAS/blob/master/Macro/excel2sas.sas.

 

Hope at least some of this helps...


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
Tom
Super User Tom
Super User

Note that you can use the XLSX engine with current versions of SAS.  With that there is no need to have Microsoft Office, or even be running SAS on Windows.

JacquesR
Quartz | Level 8

Hi Scott

 

Following Tom's suggestion, I decided to work through the loop control macro, and that did the trick.

My code employing it below:

    %MACRO CreateFileList;

        PROC DATASETS; DELETE dirlist;RUN;

        FILENAME _FList PIPE 'dir "c:\Users\...\Analysis set\*.XLSX" /b ';
        %LET _FileLoc = "c:\Users\...\Analysis set\";

		DATA dirlist ;
            INFORMAT _Region $2. _Resolution $7. _term $14.;
              FORMAT _Region $2. _Resolution $7. _term $14.;
           INFILE _FList LRECL=256 TRUNCOVER;
           INPUT file_name $256.;

			FullName=CATS(&_FileLoc,file_name);
            
            _FileNo+1;
            
		   IF FIND(file_name,"Daily") THEN
			   _Resolution="daily";
		   ELSE IF FIND(file_name,"Monthly") THEN
			   _Resolution="monthly";
		   ELSE IF FIND(file_name,"Yearly") THEN
			   _Resolution="yearly";

		   IF FIND(file_name,"AU") THEN DO;
			   _Region="AU";
			   _NSheets=5;
			END;
		   ELSE IF FIND(file_name,"US") THEN DO;
		   _Region="US";
			   _NSheets=2;
			END;
		   IF FIND(file_name,"WW") THEN DO;
		   _Region="WW";
			   _NSheets=2;
			END;

		   DO i=1 to _NSheets;
                _Sheet=i;
                OutFile1=CATS("D_",_FileNo,"_",_Sheet);
                OutFile2=CATS("D_",_FileNo,"_",_Sheet,"_Tr");
                
                _DataSet+1;
                                
		        IF i=1 THEN DO;
		            _term="facebook";
		        END;    
		        ELSE IF i=2 THEN DO; 
		            _term="youtube";
		        END;    
		        ELSE IF i=3 THEN DO; 
		            _term="cricket";
		        END;    
		        ELSE IF i=4 THEN DO; 
		            _term="weather";
		        END;    
		        ELSE IF i=5 THEN DO; 
		            _term="superannuation";
                END;
                SheetName=CATS("Samplings-",_term);
                OUTPUT;
		    END;
            DROP _NSheets i file_name;
        RUN;
        PROC PRINT DATA=dirlist;RUN;

    %MEND CreateFileList;
    %CreateFileList; RUN;

    LIBNAME DAT "c:\Users\...\Analysis set\";        

    %MACRO ReadData;
        %LET _FullName=%TRIM(&FullName);
        %LET _OutFile1=%TRIM(&OutFile1);
        %LET _OutFile2=%TRIM(&OutFile2);
        %LET _SheetName=%TRIM(&SheetName);
        PROC IMPORT DATAFILE="&&_FullName" OUT=&_OutFile1 DBMS=XLSX REPLACE; SHEET="&&_SheetName";RUN;
        
        %LET NCols=%nvars(&_OutFile1);

        DATA &_OutFile1;
            SET &_OutFile1;
            
            Sample_1=Sampling_1;
             %DO i=2 %TO (&NCols+1)/2;
                %LET c=%EVAL(&i * 2 - 2);
                %LET d=%EVAL(&i * 2 - 1);
                IF MISSING(Sampling_&c) THEN Sampling_&c=0;
                IF MISSING(Sampling_&d) THEN Sampling_&d=0;
                Sample_&i=MAX(Sampling_&c,Sampling_&d);
            %END;
        RUN;

        DATA &_OutFile2;
            INFORMAT _Substance $14. _Scale $7.;
              FORMAT _Substance $14. _Scale $7.;
            SET &_OutFile1;
			ATTRIB Date LABEL='';
            RENAME Date=_Date;
            j=0;
			ARRAY S Sample_1-Sample_%EVAL((&NCols+1)/2);
			DO OVER S;
                j=j+1;
                _Substance="&&_term";
                
                _Scale="&&_Resolution";
				_Region="&&_Region";
				_DataFile=INPUT(&_FileNo,BEST.);
                _Worksheet=INPUT(&_Sheet,BEST.);
                _Extraction=INPUT(&_DataSet,BEST.);
                _Sampling=j;
                _Estimate=S;
                IF _Estimate NE 0 THEN OUTPUT;
            END;
            DROP j 
			 Sampling_1-Sampling_%EVAL(&NCols-1)
			 Sample_1-Sample_%EVAL((&NCols+1)/2);

        RUN;    
        %IF &_DataSet=1 %THEN %DO;
            DATA DAT.D_FBYT; SET &_OutFile2; RUN;
        %END;
        %ELSE %DO;
            DATA DAT.D_FBYT; SET DAT.D_FBYT &_OutFile2; RUN;
        %END;    

    %MEND ReadData;

%loop_control(CONTROL=dirlist,MNAME=ReadData);run;

This worked as needed on a small subset of the data (60 files, ±40,000 data points.

Tomorrow I will try it on the full set.

 

Thanks

Tom
Super User Tom
Super User

Isn't there a pattern to your filenames? Can't you just extract the RESOLUTION and REGION values from the filenames?

Does the YEARLY/MONTHLY/DAILY  appear at random places in the file names?

daily file for au.xlsx
au monthly file.xlsx

For example if your filenames really look like this one from one of your earlier error messages:

C:\Users\...\(facebook)(youtube)(weather)(cricket)(superannuation);AU;2004-01-01--2005-02-03(Daily);356 samples;2019-02-13.xlsx

Then you could find REGION and RESOLUTION using simple SCAN() function calls.

_region=scan(filename,-4,';');
_resolution=scan(filename,-2,'()');

 

JacquesR
Quartz | Level 8

There is something of a patter to the file names, but there are exceptions, and given the number of files, I could choose various approaches.

I could look at multi-file renaming tools (I use Total Commander, so it's quite simple to do that), but for keeping an audit trail, it complicates things by changing file names.

And that is why I decided to follow an approach where I could get what I need from the varying file names in SAS, which is now working.

ScottBass
Rhodochrosite | Level 12

@JacquesR :

 

There are a bunch of questions you haven't addressed.  Off the top of my head:

 

* Do all the workbooks have similar structure?

* Are the worksheet names valid SAS dataset names?

* Are the worksheet names repeated across workbooks?  If so, how do you handle duplicate names, so datasets don't get overwritten?

* Otherwise, what output dataset naming pattern do you want?

* Do all the worksheets have valid SAS column names in row 1?

* If not, how do you intend to name the resultant SAS columns (likely named F1, F2, etc)

* Will your worksheet structure ever change?  For example, will AU workbooks ever have more (or less) than 5 worksheets?

 

I'm not saying don't use macro - the below approach does use macro - but IMO your macro code is convoluted, hard to follow, and hard to maintain.  Although I do admit my excel2sas macro is complex.  At least I've tried to debug it thoroughly - YMMV.

 

I'm on SAS 9.3 so I couldn't use the XLSX engine.  But the attached %excel2sas macro did work in my environment.  If you're on SAS 9.4, try changing the libname _XCEL_ to use the XLSX engine.

 

I threw together a quick test, copying a few Excel workbooks into a directory.  This code imported every worksheet from every workbook in the directory, naming the SAS dataset based on the worksheet name.

 

Perhaps you can run with this, after gaining an understanding of the macros.

 

* get list of Excel files ;
%dirlist(
   dir=\\UNC\path\to\files\Excel
   ,filter=ext='xlsx'
);

* macro to run for every record in dirlist ;
%macro code;
   * need to strip leading and trailing spaces ;
   %let fullname=%sysfunc(strip(&fullname));
   %excel2sas(file=&fullname,getnames=yes);
%mend;

* run %excel2sas for every fullname ;
%loop_control(control=dirlist);

 

From glancing at your code, I'd say you need to fiddle with the dirlist output to derive the correct worksheet names, then feed the correct name into %excel2sas.  Try to derive all your metadata in the datastep, not via macro.

 

Another option is, if the worksheet name can be "translated" by %excel2sas into a valid SAS dataset name, then post-process the import via additional code (proc datasets rename dataset) in the %code macro, after the call to %excel2sas.

 

My advice is to get %excel2sas to 1) work in your environment, and 2) test it with a small subset of your workbooks, and 3) see if post-processing will work, 4) otherwise, CAREFULLY modify the macro.

 

Hope this helps...


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
JacquesR
Quartz | Level 8

Hi Scott

 

First, thanks for trying to debug all the rest of my code, but I must stress two points:

1. I am obviously not the best SAS macro coder, but learning. Thanks for teaching me, but bear with me also.

2. Remember the original objective of my question: I need to get a list of files, and then from each file name, pick up some clues which will tell me how to deal with that workbook.

 

Further from that, but that was not part of my question, I do various calculations on the data, transpose and append the data, etc.

 

Also, thanks to your Loop control macro, I have a solution that satisfies the objective of my posting.

 

I will, though, answer some of your questions:

* Do all the workbooks have similar structure?

No. See my original post. "The exact nature of the files (e.g., the worksheets contained) differs."

 

* Are the worksheet names valid SAS dataset names?

That is immaterial, since my code gives each worksheet, which is imported as a dataset of its own, a unique dataset name.

And even those names are arbitrary, because once transposed and appended, I don't need the original dataset name any more. Further to that, I do store information in variables which tells me where the data came from, so I can actually retrace it.

 

* Are the worksheet names repeated across workbooks?  If so, how do you handle duplicate names, so datasets don't get overwritten?

Yes, the worksheet names are consistent. See my answer above for the rest.

 

* Otherwise, what output dataset naming pattern do you want?

See my answer above.

The arbitrary dataset names are created in my code, and then used in the source macro for the Loop control:

OutFile1=CATS("D_",_FileNo,"_",_Sheet);
OutFile2=CATS("D_",_FileNo,"_",_Sheet,"_Tr");

* Do all the worksheets have valid SAS column names in row 1?

Yes. It wasn't necessary to explain that, because it was not the reason for my post, and has no bearing at all on the question asked.

 

* If not, how do you intend to name the resultant SAS columns (likely named F1, F2, etc)

n/a

 

* Will your worksheet structure ever change?  For example, will AU workbooks ever have more (or less) than 5 worksheets?

Strictly speaking yes, my worksheet structure will change, but the second part of this question refers to workbook structure, not worksheet structure. So I will answer both.

Workbook structure will be the same for all workbooks from the same region, but not across regions. So US and WW have two worksheets that I want, and several worksheets that I don't want. AU has those two, and three others that I want, together with a number of worksheets that I don't want.

Worksheet structure will change in that while Columns A and B will always be consistent, the exact number of columns in each worksheet (even worksheets of the same type) may vary. This is a result of the manner in which the original data were captured. Something that might help explain the code I posted, but was not needed for the objective of my posting (i.e., it was not something I asked about, because it was not something I needed help with) is that from Columns C onwards of each worksheet, data are contained across a pair of columns, with one cell being blank, and one cell containing a value, and which of those is the case is almost impossible to determine without physically examining the file. There is no way that I will be doing that for all those columns in all those worksheets in all those workbooks, which is why i have this code to take the samplings (values spread across two columns) and combine them into samples (one variable for every two columns):

%DO i=2 %TO (&NCols+1)/2;
    %LET c=%EVAL(&i * 2 - 2);
    %LET d=%EVAL(&i * 2 - 1);
    IF MISSING(Sampling_&c) THEN Sampling_&c=0;
    IF MISSING(Sampling_&d) THEN Sampling_&d=0;
    Sample_&i=MAX(Sampling_&c,Sampling_&d);
%END;

But as I intimated before, all this is moot, because I really only had one question, which was, after creating a list of file names, how to capture the information I needed from the file name, and that has been answered.

 

Thank you

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Its very hard to read your code which is shouted at me (avoid using all upper case in code).  But from what I can tell, you may be find it easier using VBA to dump out all the data to a CSV file.  You can do this pretty easy, how exactly you do it depends on how many differences there may be.  Just from a quick search:

https://www.mrexcel.com/forum/excel-questions/82293-append-csv.html

And have that with a list of files:

https://stackoverflow.com/questions/31414106/get-list-of-excel-files-in-a-folder-using-vba

With that you can create a set of compiled CSV files, then have a simple datastep for each of them to read them into SAS.  Reading Excel format directly in SAS - I presume you are using proc import - will just lead to issues as Excel is not a fixed format file.  

JacquesR
Quartz | Level 8

Thanks for the suggestion, RW9, but VBA won't hack it.

I have about 101 million data points in amongst all these files, and I still need to do some manipulation and calculation on the data, and then restructure and merge all of it.

I did look at using PowerQuery to automate the data manipulation, but even PowerQuery was struggling with a single file, let alone the whole lot.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

"but VBA won't hack it" - VBA is Excel, are you saying that Excel cannot handle your file size?

 

"101 million data points in amongst all these files" - Then that is even more important that you get the data out of Excel format and into something useful, a dataset, CSV, JSON, XML etc.  These are standard file formats used in most applications as they are robust, cross platform, and not limited.

 

"I still need to do some manipulation and calculation on the data" - this is done after the read of the data, so nothing to do with my suggestion.  I suggest dumping each Excel file out to a plain text CSV ready for reading into SAS, which once read would then be used for further manipulation.

Astounding
PROC Star
A couple of ideas based in part on suggestions already made....

When searching try it this way:

%if %index(&Curfile, AU) %then %do;

And when assigning, remove the quotes:

%let _region = AU;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1254 views
  • 0 likes
  • 6 in conversation