Hi,
I'm trying to write a SAS code where I can count how many files exist in that folder. However, I'm struggling to execute the logic behind the code. In the scenario the output name =" dps_cais ". the output name is base on different names, which is why it's in a macro variable. I'm trying to work out a solution to flag up as fail if it does equal the filename (see the code below) which should be always be 3 per output_name.
data dps_file_count;
set WORK.FILENAME;
WHERE filename like "%DPSCAIS%" ;
data dps_file_count;
set DPS_FILE_COUNT;
WHERE filename NOT like "%Rossen%";
/*Creating a output name loop unitl it matches the filename */
data check_filenames;
set DPS_FILE_COUNT;
flag = filename;
do until (flag = filename or eof);
set DPS_FILE_COUNT end = eof;
if filename = "&output_name._DPSCAIS_Interactive_Workbook_&date..xlsm"
or filename = "&output_name. DPSCAIS append_&date..csv"
or filename = "&output_name._DPSCAIS_Summary_Report_&date..xlsx" then flag = 1;
end;
run;
/*check what files is missing*/
PROC SQL;
create table file_compare as
SELECT b.filename from check_filenames b
where filename not in (select filename from dps_file_count a where a.filename = b.filename);
quit;
PROC SQL;
create table COUNT_FILES AS SELECT COUNT(filename) AS COUNTS FROM file_compare
;
DATA CHECK_NUM_FILES;
SET file_compare;
IF COUNTS >= 1 THEN FILE_COUNT = "FAIL" ;ELSE FILE_COUNT = "PASS";
There are plenty of threads in this forum which show how to read the names of the files in a folder into a SAS data set. Then, the number of records in the resulting data set is what you want.
For most situations, you need to write the above code without macros and without macro variables FIRST, before trying to turn it into something the requires macros and requires macro variables. So, please show us code for one folder that works without macros and without macro variables, before trying to turn this into a more general piece of code.
these are the files I managed to read from the folder.
DPS_File_count table
Filename |
DPS_cais_DPSCAIS_Interactive_Workbook_20230629.xlsm |
DPS_cais_DPSCAIS_Summary_Workbook_20230629.xlsm |
DPS_Check_filenames table
Filename | Filename | Flag |
DPS_cais_DPSCAIS_Interactive_Workbook_20230629.xlsm | DPS_cais_DPSCAIS_Interactive_Workbook_20230629.xlsm | 1 |
DPS_cais_DPSCAIS_Summary_Workbook_20230629.xlsm | DPS_cais_DPSCAIS_Summary_Workbook_20230629.xlsm | 1 |
File_compare table (blank)
In the loop part of the codes the filename DPS_cais_DPSCAIS_append_20230629.csv is missing in the table which meant that the result will flag as pass instead of fail if that make sense
Okay, so first you talk about "count" the number of files. You said "I'm trying to write a SAS code where I can count how many files exist in that folder."
Now you write about something that isn't about counting, and in fact I'm not sure what it is you have just explained or how it relates to counting files, but there is a flag being computed and I don't think that has been explained.
Furthermore, I asked to see code without macros and without macro variables that works on one folder and does what you want. I still would like to see that code before I feel comfortable moving forward. In fact, I still consider such code a mandatory first step to write more generic code with macros and with macro variables.
I do not follow what you are trying to do.
You seem to be saying you found 2 files, both of which are XLSM files where the only difference is one of the many "words" in the named. Summary vs Interactive.
What is the source of the information that let's you know that there is a missing file with a CSV extension? It looks like perhaps you want a CSV file with a similar name, only with the strings 'Summary_Workbook' or 'Interactive_Workbook' in the name changed to the string 'appended' instead.
Is that what you are trying to test?
Do you expect all of the files to follow those naming conventions of:
A_B_C_X_Y_D.xlsm
A_B_C_Z_D.csv
Where A is DPS, B is cais, C is DPSCAIS, X is Interactive or Summary, Y is Workbook and D is 20230629 and Z is appended?
Which parts of ABCDXYZ vary?
So the purpose of the checks is to identify if the correct number of excel files has been outputted in the folder.
The output name always changes depending on the input name. I’ve already built this macro variable in so that it scan the input name. For every input name e.g. A we expect to see three files outputted:
The only thing that changes is the macro variable output name and date . If there is two Input file with input name A&B we should see six files.
"&output_name._DPSCAIS_Interactive_Workbook_&date..xlsm"
or filename = "&output_name. DPSCAIS append_&date..csv"
or filename = "&output_name._DPSCAIS_Summary_Report_&date..xlsx"
I think I managed to find the solution where I can create a table base of all the output filenames at that particular time and compare it with the what’s in the folder.
@Coding4you wrote:
So the purpose of the checks is to identify if the correct number of excel files has been outputted in the folder.
The output name always changes depending on the input name. I’ve already built this macro variable in so that it scan the input name. For every input name e.g. A we expect to see three files outputted:
The only thing that changes is the macro variable output name and date . If there is two Input file with input name A&B we should see six files.
"&output_name._DPSCAIS_Interactive_Workbook_&date..xlsm" or filename = "&output_name. DPSCAIS append_&date..csv" or filename = "&output_name._DPSCAIS_Summary_Report_&date..xlsx"
I think I managed to find the solution where I can create a table base of all the output filenames at that particular time and compare it with the what’s in the folder.
Look very closely at that file name for the CSV. Do you see the space between the . and DPSCAIS? and before append. The = comparison means that unless the actual file name has that space then the code won't (and likely shouldn't spaces in filenames are a bad idea) match.
Thank you, I managed to compress the file so it’s like this now:
DPS_cais_DPSCAIS_Interactive_Workbook_20230629.xlsm DPS_cais_DPSCAIS_Summary_Workbook_20230629.xlsm DPS_cais_DPSCAISappend_20230629.csv
would a loop like this work?
data check_filenames;
set DPS_FILE_COUNT;
flag = 0; /* Initialize flag to 0 */
/* Loop over the records */
do until (eof);
set DPS_FILE_COUNT end = eof;
/* Check if the filename matches all conditions */
if filename = "DPS_cais_DPSCAIS_Interactive_Workbook_&date..xlsm" and
filename = "DPS_cais_DPSCAIS_append_&date..csv" and
filename = "DPS_cais_DPSCAIS_Summary_Report_&date..xlsx" then
do;
flag = 1; /* Set flag to 1 to indicate a "pass" */
output; /* Output the record to the check_filenames dataset */
end;
end;
run;
So if the idea is that there should be all three of those filenames then you probably need three flags (or one flag that counts).
data _null_;
set DPS_FILE_COUNT end=eof;
retain flag 0;
if filename in
("DPS_cais_DPSCAIS_Interactive_Workbook_&date..xlsm"
"DPS_cais_DPSCAIS_append_&date..csv"
"DPS_cais_DPSCAIS_Summary_Report_&date..xlsx"
) then flag+1
;
if eof then do;
put 'NOTE: Found ' flag "files for &date..";
call symputx('nfiles',flag);
end;
run;
%put NOTE: Found &nfiles files for &date.. ;
Now you can use the value of the macro variable NFILES to decide whether or not to do something else.
@Coding4you wrote:
Thank you, I managed to compress the file so it’s like this now:
DPS_cais_DPSCAIS_Interactive_Workbook_20230629.xlsm DPS_cais_DPSCAIS_Summary_Workbook_20230629.xlsm DPS_cais_DPSCAISappend_20230629.csv
would a loop like this work?
data check_filenames; set DPS_FILE_COUNT; flag = 0; /* Initialize flag to 0 */ /* Loop over the records */ do until (eof); set DPS_FILE_COUNT end = eof; /* Check if the filename matches all conditions */ if filename = "DPS_cais_DPSCAIS_Interactive_Workbook_&date..xlsm" and filename = "DPS_cais_DPSCAIS_append_&date..csv" and filename = "DPS_cais_DPSCAIS_Summary_Report_&date..xlsx" then do; flag = 1; /* Set flag to 1 to indicate a "pass" */ output; /* Output the record to the check_filenames dataset */ end; end; run;
Not with AND. When you check a single variable for multiple values it will never be true for AND condition. A value cannot be 123 AND ABC at the same time.
Personally I would use
flag = filename in( "DPS_cais_DPSCAIS_Interactive_Workbook_&date..xlsm", "DPS_cais_DPSCAIS_append_&date..csv" , "DPS_cais_DPSCAIS_Summary_Report_&date..xlsx" );
IN tests if a variable is one of multiple values. The result of any logical comparison in SAS is 1 or 0 so you can assign values this way skipping the if/then to assign 1/0 when true/false.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.