I need help converting an XLSX file to a true CSV file (i.e., a CSV file that doesn't contain any Excel formatting or highlighting). I don't have SAS/ACCESS to PC files.
I write a lot of programs for data management to detect dirty data. When the report doesn't produce any results, we aren't sure if it is because the data is clean or if my programming is wrong. When the report is blank, I export the associated SAS datasets and give them to the data manager to update. I will read in the modified file and run that data through the program to see if the programming is correct.
My problem is that if I read in a straight Excel file, SAS likes to pick and choose what formats and data types to apply to variables, which in turn causes errors in my code or causes more programming to get the right types, formats, and lengths applied to the test data. My workaround was to just create a new dataset by setting the original table and then manually changing the fields that were updated by the DM. For example:
data new;
set original.old;
if subjid='101-001-005' then weight=400;
if subjid='101-002-003' then sex='F';
run;
However, there are times when the DM makes a LOT of changes to many datasets, and it is too time consuming to manually enter every single update. In those cases, I prefer to read in the data. Since I mentioned above that SAS likes to automatically assigns data types and formats, I have found that I can save the XLSX file as a CSV and then read in a CSV file by using the meta data from the original library to assign the exact same data type, formats, and lengths.
That was a long background, but here is the root of my problem. I wrote some VBscript that reads all the XLSX files in a folder and converts them to CSV. If I were to manually open an XLSX file, I can "save as" CSV. This CSV file will loose any Excel formatting (like highlighting, Excel formats, etc.). But my problem is that in my code, it doesn't seem to save as a true CSV file. Instead, it keeps all the highlighting as though it is still an XLSX file with a CSV extension. When I then read in this newly converted CSV file, it read in as jumbled characters.
Attached is the original XLSX file that was updated by a DM. I tried attaching the CSV file and the jumbled SAS dataset, but I couldn't get them to attach. I can try to email those individually if you think you can help.
Can someone help me find out how I can convert XLSX to a true CSV?
Here is the VB script that I am using.
%macro convert_files(default=,inext=,outext=);
data _null_;
file "'&default\temp.vbs'";
put "set xlapp = CreateObject(""Excel.Application"")";
put "set fso = CreateObject(""scripting.filesystemobject"")";
put "set myfolder = fso.GetFolder(""&default"")";
put "set myfiles = myfolder.Files";
put "xlapp.DisplayAlerts = False";
put " ";
put "for each f in myfiles";
put " ExtName = fso.GetExtensionName(f)";
put " Filename= fso.GetBaseName(f)";
put " if ExtName=""&inext"" then";
put " set mybook = xlapp.Workbooks.Open(f.Path)";
put " xlapp.Visible = false";
put " mybook.SaveAs ""&default.\"" & Filename & "".&outext"", 51";
put " End If";
put " Next";
put " mybook.Close";
put " xlapp.DisplayAlerts = True";
put " xlapp.Quit";
put " Set xlapp = Nothing";
put " strScript = Wscript.ScriptFullName";
put " FSO.DeleteFile(strScript)";
run;
x "cscript ""&default\temp.vbs""";
%mend convert_files;
%convert_files(default=C:\Users,inext=xlsx,outext=csv);
... View more