DATA Step, Macro, Functions and more

file formats

Reply
Contributor
Posts: 72

file formats

Dear all,

 

I have been struggling with the below file issue. My project is ready to get deployed next week and now we started facing the below issues wih the code.

 

We receive .XLS and .XLSX files from various countries, which looks like attached. Once imported, the default column names (A B C D.... or F1 F2 F3....) will be assigned and then we coded the rest accordingly. till now everything was going good, lately when we import the file, the code is createing an extra column in the first place of the file which is moving the rest of the columns to right thus F1 becomes F2 so on.. and because of this the code is picking incorrect data.

 

I dont understand how this can happen. Please help me understand and kindly advise on solution for this..

 

Thank you!

Respected Advisor
Posts: 2,647

Re: file formats

[ Edited ]

I won't look at the attached .xlsx file, because downloading and opening .xlsx files are security risk. However, the question I have for you is: did you actually look at the .xlsx file that is the problem? What is in column A? Is it the expected data, or is it the unexpected extra column? Can you show us a screen capture of what the first few rows and columns of the .xlsx file which is causing the problem?

 

Long-term, this is a problem using .xlsx files to transmit the data, that the format of the file can change and then your code won't work.

--
Paige Miller
Super User
Posts: 13,008

Re: file formats


@don21 wrote:

Dear all,

 

I have been struggling with the below file issue. My project is ready to get deployed next week and now we started facing the below issues wih the code.

 

We receive .XLS and .XLSX files from various countries, which looks like attached. Once imported, the default column names (A B C D.... or F1 F2 F3....) will be assigned and then we coded the rest accordingly. till now everything was going good, lately when we import the file, the code is createing an extra column in the first place of the file which is moving the rest of the columns to right thus F1 becomes F2 so on.. and because of this the code is picking incorrect data.

 

I dont understand how this can happen. Please help me understand and kindly advise on solution for this..

 

Thank you!


So show the code that is reading the .xls and/or .xlsx files. That statement alone raises concerns because the file structures are different and have different limitations.

 

My guess without an example is that you are 1) using proc import and 2) one or more of the data providers changed the file layout without telling you.

Contributor
Posts: 72

Re: file formats

Hi Ballardw,

 

PFB sample example code that we used: 

 

option obs=max;
proc import datafile= "//<<<Path>>/ABCD.xlsx"
out=ABCD
dbms=xlsx replace;
run;

DATA ABCD1;
SET ABCD ( KEEP = A B H I J P Q R S U); /** after renaming the A B... variables, they will be dropped at the end*/
<<>>
<<>>
run;

Super User
Super User
Posts: 7,845

Re: file formats


@don21 wrote:

Hi Ballardw,

 

PFB sample example code that we used: 

 

option obs=max;
proc import datafile= "//<<<Path>>/ABCD.xlsx"
out=ABCD
dbms=xlsx replace;
run;

DATA ABCD1;
SET ABCD ( KEEP = A B H I J P Q R S U); /** after renaming the A B... variables, they will be dropped at the end*/
<<>>
<<>>
run;


Your PROC IMPORT code is telling SAS to read the variable names from the first row in the file. So if someone changes the first row then the variable names will change.  You could tell SAS not to read the variable names.

 

Your PROC IMPORT code is not telling SAS which sheet to read from the workbook.  What happens if they send one workbookwith three sheets?

 

Super User
Posts: 9,551

Re: file formats

First of all, stop using Excel files and only accept text-based files, either fixed-width or csv.

Second, stop using Excel files and only accept text-based files, either fixed-width or csv.

Third, stop using Excel files and only accept text-based files, either fixed-width or csv.

 

As long as you accept garbage, the stink won't go away.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 72

Re: file formats

Posted in reply to KurtBremser
Hi KurtBremser,

I agree to what you said, unfortunately I don't have the authority to reject garbage. No matter what, I have to automated the crap that I get. With a lot of patience, I did but now this? Please advise.

Regards..
Super User
Posts: 9,551

Re: file formats


@don21 wrote:
Hi KurtBremser,

I agree to what you said, unfortunately I don't have the authority to reject garbage. No matter what, I have to automated the crap that I get. With a lot of patience, I did but now this? Please advise.

Regards..

You have a fundamentally broken process. That needs to be fixed first.

I couldn't maintain 1000+ SAS jobs on my own (which I do) without proper processes.

Make it clear to the people responsible that keeping that mess up will cost MONEY in terms of your unnecessarily wasted time, as you will have to go on fixing and fixing it time and again. While you could do really useful stuff.

Invest the time NOW to set up a proper process (sensible file format, documented structure, agreement between provider and receiver). It will be worth it, a hundred times over. Not exagerrating here.

 

I really mean it. You can quote me on this. And @Reeza probably too.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Posts: 22,823

Re: file formats

This means the Excel files do not have the same structure. Since Excel is not a database it does’t enforce types. 

This willl definitely be problematic in a production type process.

 

You should convert the file to a CSV and read that in instead. You can control the types/formats as desired and ensure you always have the same data inputted or at least know if you don’t. 

Contributor
Posts: 72

Re: file formats

Hi Reeza,

I receve data for 14 countries. Each country sends atleast 20 excel files (both .xls and .xlsx), If I have to change the codes to CSV conversions at this point, it wont be feasible and lot of things can go wrong while doing this.

Also, the structure of the input files loosk exactly the same including the blank columns. The code itself is creating a blank column sometimes.

I understand that working with excel files is complete wate of time but at this point where my project is ready to go live, I need some solution to fix this..

Please help!
Super Contributor
Posts: 498

Re: file formats


@don21 wrote:

I understand that working with excel files is complete waste of time but at this point where my project is ready to go live, I need some solution to fix this..


If you code does not contain steps to detect and fix excel-files not having the expected structure, your project is not ready to go live. You have to check every variable and ensure that it has the expected type and length.

 

Can you post the code used to read the excel files?

Contributor
Posts: 72

Re: file formats

Posted in reply to andreas_lds

@andreas_lds@Andre

 

Hi, We have checked every possibility that could go wrong, I have taken over this project recently  and the test runs went well t=until now. Now a days the input file formats have been creating issues although they look as as the previous ones. we could not get back tot he providers as we could not spot the difference between the files.. SAS is creating blank columns in between is my best guess in this situation or there must be some internal format issue with the input file which we couldnt spot.

Super User
Posts: 22,823

Re: file formats

Usually, in my experience, this happens if you have formulas or had information and deleted but had a format attached. Excel is still keeping that information somewhere so SAS thinks it has info.

 

If you data is always in specific columns though, you could specify the range start and column end and see if that alone fixes it. 

 

 

Super User
Posts: 22,823

Re: file formats

[ Edited ]

Unfortunately converting to XLSX CSV is the best solution. You could add a check to ensure your data meets the expected format but IMO that’s more work. 

 

There are macros row and scripts that will convert all your XLSX and XLSX to CSV in batch. 

 

Ultimately however, it’s your choice which approach to take. Do consider what happens if you go with the Excel approach and and it decides to add two extra columns next time or a bunch of empty rows. Is your process robust enough to handle that?

 

EDIT: I typed XLSX when I meant CSV. 

 

The script is here and the code is 6 to create a CSV (51 for XLSX in the code).

https://msdn.microsoft.com/en-us/vba/excel-vba/articles/xlfileformat-enumeration-excel

 

 options noxwait noxsync; 
 
%macro convert_files(default=,ext=);
 
data _null_;
file "'&default\temp.vbs'";
put "set xlapp = CreateObject(""Excel.Application"")";
put "set fso = CreateObject(""scripting.filesystemobject"")";
put "set myfolder = fso.GetFolder(""&default"")";
put "set myfiles = myfolder.Files";
put "xlapp.DisplayAlerts = False";
put " ";
put "for each f in myfiles";
put "  ExtName = fso.GetExtensionName(f)";
put "  Filename= fso.GetBaseName(f)";
put "    if ExtName=""&ext"" then";
put "           set mybook = xlapp.Workbooks.Open(f.Path)"; 
put "           xlapp.Visible = false";
put "           mybook.SaveAs ""&default.\"" & Filename & "".csv"", 6";
put "    End If";
put "  Next";
put "  mybook.Close";
put "  xlapp.DisplayAlerts = True";
/* Removes original files */
/*put " FSO.DeleteFile(""&default\*.&ext""), DeleteReadOnly";*/
put " xlapp.Quit";
put " Set xlapp = Nothing";
put " strScript = Wscript.ScriptFullName";
put " FSO.DeleteFile(strScript)"; 
run; 
 
x "cscript ""&default\temp.vbs""";
 
%mend;
 

Source:

http://support.sas.com/kb/43/496.html

Super User
Posts: 13,008

Re: file formats


@don21 wrote:
Hi Reeza,

I receve data for 14 countries. Each country sends atleast 20 excel files (both .xls and .xlsx), If I have to change the codes to CSV conversions at this point, it wont be feasible and lot of things can go wrong while doing this.

Also, the structure of the input files loosk exactly the same including the blank columns. The code itself is creating a blank column sometimes.

I understand that working with excel files is complete wate of time but at this point where my project is ready to go live, I need some solution to fix this..

Please help!

It is real hard to fix code without seeing any code. No code, no fix to code. Simple solution set.

Ask a Question
Discussion stats
  • 20 replies
  • 187 views
  • 14 likes
  • 8 in conversation