DATA Step, Macro, Functions and more

base sas

Reply
Contributor
Posts: 72

base sas

Dear all,

 

I have 3 issues that are bugging me on my current project:

 

We have developed coding for some 20 countries based on the input files they share via common sharepoint. We take those inputs and import them in our coding and do the rest of the process. 

Problems:

1. The file names of the input files are changing every day.

2. there are empty "COLUMNS" in the input files which we are deleting manually

3. for few variables, the data type is Char one day and is num the other day, however teh data doesnot change. but this is creating a problem in the merge statements later in the code.

 

Please advise..

 

Thank you!

 

 

Contributor
Posts: 72

Re: base sas

there is a point 4
4. the data row starts from 4th observation and we are unable to give "namerow" or datarow as these are .xls and .xlsx files. We can not specify range because the end row is not fixed, sometimes there will be 100 rows and sometimes it could be 200 rows.
Super User
Posts: 22,850

Re: base sas

This is a process problem not a data problem. Don’t try and code your way out of bad design. 

 

Ok...assuming you can’t change the process, start by making a table that has the structure you want. 

Import the excel file and write a clean up routine that checks the type matches from your master table and if not converts it. Then add it to your data. 

 

Columns not in your master are ignored. 

 

Use OS commands to get the file names, or check the SAS macro appendix for sample code on listing files.

Contributor
Posts: 72

Re: base sas

Hi Reeza, 

 

am afraid that making a table structre for all the inputs and for multiple countries is not a flexible idea as it jus makes the code complex. And yes, I am actually looking for a macro code that picks up the file names. Smiley Happy

 

 

Super User
Posts: 22,850

Re: base sas


don21 wrote:

Hi Reeza, 

 

am afraid that making a table structre for all the inputs and for multiple countries is not a flexible idea as it jus makes the code complex. And yes, I am actually looking for a macro code that picks up the file names. Smiley Happy

 

 


Cleaning up the data ahead of time will simplify your processes downstream. Otherwise you're basically just pushing the issue downstream and I guarantee you won't be able to combine the files. If this is a one time process then duct tape it all you want, but if you ever need to run this again you're going to have fun. 

Contributor
Posts: 72

Re: base sas

I have actually written a macro to pick all the file names in the path and it creates a data set with all the file names.

No the challenge is to write another macro which opens that data set , reads each file name and imports it. This needs to be done like a loop as I have nearly 200 file names.. Any advuse?

Super User
Super User
Posts: 7,856

Re: base sas

What have you tried?

Break your problem into parts.

Part 1 is reading the filenames.  Sounds like that should NOT be an issue in this case.

Part 2 is creating a macro that takes as input the name of a file and produces a SAS dataset.  This is where you can try to put a solution to the problem caused by reading unstable input files.  There are many threads on this that have ideas for this.  If the real problem is that sometimes empty columns cause cause wrong variable types then perhaps you could include one of the solutions for dropping empty variables.

 

Once you have those solved then you just need a simple data step to read the dataset from Part1 and use it call the macro created to solve part2. So assuming you have a macro named PART2_MACRO that takes a filename as input and you have dataset named FILENAMES with a variable named FILENAME then the data step would look like this.

data _null_;
  set filenames ;
  call execute(cats('%nrstr(%part2_macro)(',filename,')'));
run;
Contributor
Posts: 72

Re: base sas


data _null_;

set FILENAMES;

call execute('proc import datafile="C:\Reports\EPD\Countries\Botswana\Input\'||strip(memname)||'.xls" out='||strip(memname)||';

run;');

run;



the problem is , files names are a complete crap, they have spaces, underscores, '-' s braces etc., I am getting error stating invalid filename.. Also, the data does not start from row1 they start from 5 or 6 in some file and there are also empty columns in some excel files which needs to be deleted.. Please advise
Super Contributor
Posts: 498

Re: base sas

There is just one thing i can recommend: change the filenames. There is no need to have anything except the letters a-z, numbers, underscore and one period to separate name from extension in filenames. Anything else is fine inside of documents, but keep that crap out of filenames. Just think about the time it takes now to extend a working program, just because you have to handle bad named files.

Contributor
Posts: 72

Re: base sas

Posted in reply to andreas_lds
actually, we r not thinking of doing this manually as we have 20 folder containing atleast 10 files each like this.. soo..
Contributor
Posts: 72

Re: base sas

Posted in reply to andreas_lds
proc import datafile="C:\Reports\EPD\Countries\Botswana\Input\Finance_Overdue*.xlsx"
out=Botswana4
dbms=xlsx replace;
run;
-----> Works WELL!!
proc import datafile="C:\Reports\EPD\Countries\Botswana\Input\excesstrn*.xls"
out=Botswana5
dbms=XLS replace;
run;
---> Does not work and says - "Too many XF records for Excel ->...." but my files are fine please suggest any other ways to import ".xls files like above using name*.xls
Super User
Super User
Posts: 7,856

Re: base sas


don21 wrote:
proc import datafile="C:\Reports\EPD\Countries\Botswana\Input\Finance_Overdue*.xlsx"
out=Botswana4
dbms=xlsx replace;
run;
-----> Works WELL!!
proc import datafile="C:\Reports\EPD\Countries\Botswana\Input\excesstrn*.xls"
out=Botswana5
dbms=XLS replace;
run;
---> Does not work and says - "Too many XF records for Excel ->...." but my files are fine please suggest any other ways to import ".xls files like above using name*.xls

Is this post an attempt to answer the original question in some way?  If not then post it as a new question.

You cannot pass more than one input filename to PROC IMPORT.  I suspect that your first pattern only matches one filename and so proc import is smart enough to figure that out and import the file.

Contributor
Posts: 72

Re: base sas

this is part of the original question and is stilll not an answer, I requested for a solution to one of the above points..
Super User
Posts: 13,046

Re: base sas


don21 wrote:
proc import datafile="C:\Reports\EPD\Countries\Botswana\Input\Finance_Overdue*.xlsx"
out=Botswana4
dbms=xlsx replace;
run;
-----> Works WELL!!
proc import datafile="C:\Reports\EPD\Countries\Botswana\Input\excesstrn*.xls"
out=Botswana5
dbms=XLS replace;
run;
---> Does not work and says - "Too many XF records for Excel ->...." but my files are fine please suggest any other ways to import ".xls files like above using name*.xls

If you ever want to combine these data sets then you cannot rely on Proc Import. Each time proc import is called it is guessing as to type and length of data. If the internal layout, except for the number of header rows, is consistent you would be much better off converting the files to CSV and then using a data step to read the converted file. The data step would set consistent variable names, types and lengths.

 

Depending on what you varying number of header rows looks like you could likely even test if your row is data or header.

 

But if this project is I have 50 different file formats, I never know what format anything is in, then automation is out of the question.

 

Other issues is you may have people misnaming files XLS that are actually XLSX, XML, HTML, CSV and who knows what.

And from the overall question I sincerely doubt "but my files are fine" is true.

Super User
Super User
Posts: 7,856

Re: base sas


don21 wrote:

data _null_;

set FILENAMES;

call execute('proc import datafile="C:\Reports\EPD\Countries\Botswana\Input\'||strip(memname)||'.xls" out='||strip(memname)||';

run;');

run;



the problem is , files names are a complete crap, they have spaces, underscores, '-' s braces etc., I am getting error stating invalid filename.. Also, the data does not start from row1 they start from 5 or 6 in some file and there are also empty columns in some excel files which needs to be deleted.. Please advise

Strange filenames will cause two separate problems with code like that, the input filename and output dataset name. 

To fix the problem with the filenames confusing your simple quoting of the filename you can use the QUOTE() function.

To fix the dataset name you need to do a little more work. For example you could test if the basename before the .XLS is a valid membername and if not replace it with a generated name.

 

 

data _null_;
   length memname $32 filename $512 ;
   set FILENAMES;
   memname = scan(filename,-2,'\.');
   if not nvalid(memname) then memname=cats('FILE',_n_);
   call execute(catx(' '
   ,'proc import datafile='
   ,quote(trim(filename))
   ,'out=',memname,'(label=',quote(trim(filename)),');'
   ,'run;'
   ));
run;
Ask a Question
Discussion stats
  • 17 replies
  • 169 views
  • 0 likes
  • 5 in conversation