BookmarkSubscribeRSS Feed
robeldritch
Obsidian | Level 7

Thanks for the reply, Patrick. If I had more time, I would implement a solution with one of those other applications. Since I'm most familiar with SAS, if I can get it all done (by "it", I mean all of the data transformation and appending this project requires after reading in the files, aside from reading in the files themselves) with just clicking the "Run" button in a single instance of SAS, which is what I've got now, then that's what I'm going with. I appreciate the other solutions y'all have presented in this thread using those other applications (VBS, Powershell, ChatGPT), and I'll certainly look to those for future projects involving this kind of purpose. I'm just not familiar with those, and for me to get as familiar with, for example, VBS, as I would need to get in order to make that script work (to be clear, I've never worked with VBS programming before) would take more time than I have to devote to this project at the moment.

Patrick
Opal | Level 21

@robeldritch 

I've heard this "no time" argument already way too often. I agree that not everything needs to be "perfect" but I also believe you need to always ask yourself what level of robustness you need.

For an adhoc job your DDE would be good enough for me but when it comes to production jobs that run regularly then spending the time upfront is imho a must. If you don't then it's only a question of time until something falls over and then it's a production issue with all the visibility, loss of customer trust and loss of your reputation. ...plus investigation and fixing things will likely add much more time than trying to "get it right" from start.

 

Also be aware that besides of everything else that already has been said DDE is really slow. 

 

Creating and calling a .vbs for conversion of Excel to csv isn't hard. With the right prompts chatGPT provided me with code that I could use almost "as is". It's even given me the syntax how to call the .vbs with parameter passing. The vbs is basically automation of Excel "Save As... csv".

sheet.SaveAs targetPath, 6     REM 6 refers to the CSV format

 

Not sure if it's still worth it but because I've got something working here the code:

/** sample call **/
/* 1. create driver table with source paths to excel and libref for target tables */
%let source_dir=C:\temp\xlsx2csv;
%let target_lib=WORK;
data ExcelsToRead;
  infile %tslit(dir /B "&source_dir\*.xls*") pipe truncover end=last;
  input ExcelWorkbook $100.;
  if upcase(scan(ExcelWorkbook,-1,'.')) in ('XLSX','XLSM');
  target_lib="&target_lib";
run;
  
/* 2. call macro once per source Excel */
filename codegen temp;
data _null_;
  file codegen;
/*  file print;*/
  set ExcelsToRead;
  length cmd $300;
  cmd=cats('%convertExcelwb2SAStbl(source_file=%nrstr(',"&source_dir\",ExcelWorkbook,')');
  cmd=cats(cmd,',outlib=',target_lib,');');
  put cmd;
run;

%include codegen /source2;
filename codegen clear;

And here the macro that gets called and does all the work:

Spoiler
%macro convertExcelwb2SAStbl(
  /* required parameters */
  source_file=
  /* optional parameters */
  ,outlib=WORK
  ,csv_folder=       /* WARNING! All .csv files under this folder will get deleted. Default points to folder under WORK. */
  ,vbscript=
  ,vbscript_gen=yes
  ,delObsAllMissing=no
  );

  %local work_path;
  %let work_path=%sysfunc(pathname(work));

  %if %nrbquote(&csv_folder) = %nrbquote() %then %let csv_folder=&work_path\csv_folder;
  %if %nrbquote(&vbscript)   = %nrbquote() %then %let vbscript  =&work_path\excel2csv.vbs;

  /** create vb script for conversion of Excel workbook to csv files (save as... csv) **/
  %global vbscript_gen_flg;
  %if %upcase(&vbscript_gen)=YES and %nrbquote(&vbscript_gen_flg) ne %nrbquote(1) %then
    %do;
      %macro gen_vbs(
        outpath=&vbscript
        );
        data _null_;
        /*  file print;*/
          file "&outpath";
          put
            'REM Usage:' /
            'REM cscript "C:\path\to\vbs\ConvertXlsxToCsv.vbs" "C:\path\to\source.xlsx" "C:\path\to\target\folder"' /

            'REM Get the arguments' /
            'Dim args' /
            'Set args = WScript.Arguments' /

            'If args.Count <> 2 Then' /
            '    WScript.Echo "Usage: cscript ConvertXlsxToCsv.vbs <source path> <target folder>"' /
            '    WScript.Quit 1' /
            'End If' /

            'Dim sourcePath, targetFolder' /
            'sourcePath = args(0)' /
            'targetFolder = args(1)' /

            'REM Create Excel application' /
            'Dim excelApp' /
            'Set excelApp = CreateObject("Excel.Application")' /

            'REM Disable Excel alerts' /
            'excelApp.DisplayAlerts = False' /

            'REM Open the source workbook' /
            'Dim workbook' /
            'Set workbook = excelApp.Workbooks.Open(sourcePath)' /

            'REM Loop through each sheet in the workbook' /
            'Dim sheet, sheetName, targetPath' /
            'For Each sheet In workbook.Sheets' /
            '    sheetName = sheet.Name' /
            '    REM Set the target path for each sheet' /
            '    targetPath = targetFolder + "\" + sheetName + ".csv"' /
            '    REM Save the sheet as CSV' /
            '    sheet.SaveAs targetPath, 6 REM 6 refers to the CSV format' /
            'Next' /

            'REM Close the workbook' /
            'workbook.Close False' /

            'REM Quit Excel' /
            'excelApp.Quit' /

            'REM Clean up' /
            'Set sheet = Nothing' /
            'Set workbook = Nothing' /
            'Set excelApp = Nothing' /

            'WScript.Echo "Conversion complete. CSV files saved to " + targetFolder'
            ;
      %mend;
      %gen_vbs();
      %let vbscript_gen_flg=1;
    %end;

  /* create target folder &csv_folder for csv files if not exist */
  %if %sysfunc(fileexist(&csv_folder))=0 %then
    %do;
      data _null_;
        infile %tslit(mkdir "&csv_folder") pipe;
        input;
        put _infile_;
      run;
    %end;
  /* if folder &csv_folder exists: delete any csv file potentially created by an earlier run */
  %else
    %do;
      data _null_;
        infile %tslit(del /Q "&csv_folder\*.csv") pipe;
        input;
        put _infile_;
      run;
    %end;

  /* execute vbs: creates csv files from Excel Workbook under folder &csv_folder */
  data _null_;
    infile %tslit(cscript "&vbscript" "&source_file" "&csv_folder") pipe;
    input;
    put _infile_;
  run;

  /* populate macro vars with names of generated csv files */
  data work.__csv_files;
    infile %tslit(dir /B "&csv_folder\*.csv") pipe truncover end=last;
    input csv_file $35.;
    call symputx(cats('csv_file_',_n_),cats('%nrstr(',csv_file,')'),'l');
    call symputx(cats('out_tbl_',_n_), cats("'",substr(csv_file,1,length(csv_file)-4),"'n"),'l');
    if last then
        call symputx('n_csvfiles',_n_,'l');
  run;

  /** read generated csv files into SAS tables **/
  %do i=1 %to &n_csvfiles;
    /* read csv files into SAS table                                          */
    /* - use proc import to automatically determine required variable lengths */
    /* - read header row as data so all variables become character            */
    proc import 
      datafile=%tslit(&csv_folder\&&csv_file_&i)
      out=work.__csv_data
      dbms=csv
      replace;
      getnames=no;
      datarow=1;
      guessingrows=max;
    run;

    /* populate macro vars with code to rename variables with values from first row (=header from csv file) */
    data _null_;
      set work.__csv_data(obs=1) nobs=__nobs;
      array __charvar{*} _character_;
      do __i=1 to dim(__charvar);
        call symputx(cats('ren_var_',__i),cats(vname(__charvar[__i]),"='",__charvar[__i],"'n"),'l');
      end;
      call symputx('N_ren_var',dim(__charvar),'l');
    run;

    /* create final output table with table and variable names as per Excel sheet */
    /* - skip first obs with header names                                         */
    /* - execute generated code for renaming variables                            */
    data &outlib..&&out_tbl_&i;
      set work.__csv_data;
      if _n_<2 then delete;
      %if %upcase(&delObsAllMissing)=YES %then
        %do;
          if cmiss(of __charvar[*])=dim(__charvar) then delete;
        %end;
      /* generate variable rename code */
      %do k=1 %to &N_ren_var;
        rename &&ren_var_&k;
      %end;
    run;
    proc datasets lib=work nolist nowarn;
      delete __csv_data;
    quit;
  %end;

  /** housekeeping **/
  proc datasets lib=work nolist nowarn;
    delete __csv_files;
  quit;

%mend;

If it was me then I'd eventually would still put in a bit more work for a production job and add additional functionality and checks. For example a parameter that allows to request the target tables being created in sub-folders with the name of the Excel workbook. 

The potential current "challenge" when creating tables for multiple Excel files is that the same sheet name results in the same table name and though if storing all under the same folder/libref there is a risk that tables get overwritten. Right now provide different librefs for &outlib if there is such a risk.

 

I've already put in some work for testing with special characters like & so they don't create issues (in path, file, tab and column names). I'd put in more unit test work if it was me to release the macro into Test/Prod.

 

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 842 views
  • 6 likes
  • 6 in conversation