DATA Step, Macro, Functions and more

Reading Delimited file | number of columns could change each month

Reply
Occasional Contributor
Posts: 10

Reading Delimited file | number of columns could change each month

Hello Everyone,

 

Warm greetings for the day.

I have written a proc import to read a delimted file whose number of columns were constant.

 

Now, the source provider has conveyed that the new columns will keep adding/deleting every month in this delimited file.

So, could you please advise that how do I approach this problem where the number of columns keeps increasing/decreasing every month?

 

I have following approach in mind :

Guess, I can start to maintain a metadata excel, which will store the column names and its SAS data type with length and format .

This excel will act as source to identify the number of columns in the delimited file.

Then writing a macro enabled proc import with column names coming from the excel file ,could solve the issue.

I will work on this approach this weekend.

 

but ,does anyone has any better solution please ?

I will be thankful

Thanks a lot .

kind regards,

Neeraj singh

Super User
Posts: 23,998

Re: Reading Delimited file | number of columns could change each month

Posted in reply to thisisneeraj

You say Excel but delimited file. Those are two very different things, which one are you dealing with?

 

Your process sounds pretty much exactly what's required. 


Assuming delimited file:

 

1. Use PROC IMPORT, setting GUESSINGROWS=MAX to get better results. 

2. Maintain a table of fields. Add a 'check' process that alerts you if there are new/different columns

3. Add a step to convert field types

4. Politely chastise at data provider explaining the amount of work they've added for you - this should really be step 1. And then you go back and do 1 to 3 because you don't really have a choice Smiley Wink

 


thisisneeraj wrote:

Hello Everyone,

 

Warm greetings for the day.

I have written a proc import to read a delimted file whose number of columns were constant.

 

Now, the source provider has conveyed that the new columns will keep adding/deleting every month in this delimited file.

So, could you please advise that how do I approach this problem where the number of columns keeps increasing/decreasing every month?

 

I have following approach in mind :

Guess, I can start to maintain a metadata excel, which will store the column names and its SAS data type with length and format .

This excel will act as source to identify the number of columns in the delimited file.

Then writing a macro enabled proc import with column names coming from the excel file ,could solve the issue.

I will work on this approach this weekend.

 

but ,does anyone has any better solution please ?

I will be thankful

Thanks a lot .

kind regards,

Neeraj singh


 

Super User
Super User
Posts: 9,829

Re: Reading Delimited file | number of columns could change each month

Posted in reply to thisisneeraj

"Now, the source provider has conveyed that the new columns will keep adding/deleting every month in this delimited file." - what does your import agreement contain?  You can code around, or re-do it each import, however it will cause problems all over the place.  Why are the columns changing?  I would bet a £1 that the provider is an Excel user, and due to zero control are just doing it themselves, or that they are using the Excel thinking of putting data across the page, i.e. years going to the right.

Fix the structure of the import data or you will never stop coding/re-coding your process.  Putting the year (in my example) going down the page means that observations change but structure does not - this is how it is done in DB's/SAS and various other tools.

So not this:

ID   2013  2014   2015  2016 2017

1     a        b         c        d       e

 

but this:

ID  YEAR  RESULT

1    2013    a

1    2014    b

...

Super User
Super User
Posts: 8,273

Re: Reading Delimited file | number of columns could change each month

Posted in reply to thisisneeraj

PROC IMPORT does not care how many columns are in the source file. It looks at the file and guesses how many columns there are, what to call the variables and what type length to use for the variables. So if you give a different file the PROC IMPORT code doesn't change but the generated dataset reflects the data that is in it.

 

Did you mean that you had created a data step to read the delimited file?

If the changes are predictable then perhaps your data step can automatically adjust.  For example they might mean that the data has repeated measures and some rows will have more repeats than others.  If so a simple data step can do that automatically if the structure is clear.

 

Did you mean that you had created other steps after the PROC IMPORT step that are assuming that certain variable exist?

If you you might be able to make that code more flexible also.  Perhaps by running PROC CONTENTS on the dataset that PROC IMPORT generated.

Ask a Question
Discussion stats
  • 3 replies
  • 147 views
  • 3 likes
  • 4 in conversation