BookmarkSubscribeRSS Feed
don21
Quartz | Level 8

option obs=max;
proc import datafile= "//<<path>>/ABCD.xlsx"
out=ABCD
dbms=xlsx replace;
run;


DATA ABCD1;
FORMAT Current_Excess BEST.;
SET ABCD ( KEEP = A B H I J P Q R S U);
RETAIN RM_NAME Entity_Name RM_CODE Limit_Product_Type Account_ReferenceNo System_ID;
IF INDEX (A, "Name :")>0 THEN RNAME = B;
IF INDEX (A, "Customer :")>0 THEN Entity = B;
IF INDEX (A, "Code :")>0 THEN CODE = B;
IF INDEX (A,"Product :") THEN Limit_Product = B;
IF INDEX (A,"Id :") THEN ReferenceNo = B;
IF INDEX (A,"Id :") THEN ID = B;
CURRENCY =P;
Current_Excess = Q;
/*Current_Excess = INPUT (Current_Exces,BEST.);*/
NO_OF_DAYS_InExcess =U ;
Excess_Sin = S;
CODE = I;
NAME = J;
EXCESS_SINCE = INPUT (EXCESS_SIN, COMMA8.);
EXCESS_SINCE = EXCESS_SINCE - 21916;
FORMAT EXCESS_SINCE DATE11.;
LENGTH FILENAME $50. REPORT $30.;
FILENAME = "ABCD";
REPORT = "ABCD";
IF CURRENCY IN (" ", "Fin Ccy") THEN DELETE;
DROP A B H I J S P Q R U EXCESS_SIN;
RUN;

 

PROC SORT DATA = ABCD1 OUT = ABCD_FINAL noduprecs;
BY NAME;
RUN;

 

 

Sample code for a file....

Patrick
Opal | Level 21

@don21

Like others already wrote Excel is unsuitable for data exchange. But I also understand that this might be out of your control. You will just need to be very clear with your client/manager that there is quite a big likelihood for ongoing production issues with such source data.

 

Comparing the sample Excel with your code demonstrates one of the issues: In your code you're reading NO_OF_DAYS_InExcess from column U while in your sample Excel it's on column V.

 

When dealing with Excels and especially when the sheet actually contains reports and is not just a data sheet, you need to code with much more checks and balances than what your code sample demonstrates.

 

Don't trust on which column a report starts. Analyse the "raw" data first to determine the structure (i.e. search on which column you find a token like "Client ID" - just something which is unique and gives you the starting point.

 

Client ID should also give you the left top corner of a table. So next step for me would be to analyse the table headings on this row and determine if they follow an expected structure. Only if that's validated I'd go and map the data to my output columns.

 

Basically: You need much more analysis of the source structure and you need much more validation than what the code you've posted demonstrates.

 

I've been once in a similar situation. My approach was to first convert the Excels to .CSV as @Reeza proposes, then implement all the checks and balances and create a .csv structure as I'd really would have needed it as a data source. After this step I could go for a clean ETL process.

This allowed me to separate "the mess" from the rest of the process and as I've been optimistic also communicate back to the client what data exchange format they should aim for in a future release (and then hoping the only change required to my code could be to remove the "messy" conversion step). ....I believe this never happened but it certainly helped operationally to separate issues with unreliable data sources from the rest of the process

don21
Quartz | Level 8

Sure, Thank you Patric.. I requested my clients if the could give the inputs in any other formats.. They suggested .RPT. Well, I have never used that before, tried to google for syntax but no use. Any views on how to do that? because, if this is good, atleast I can buy sometime and do the modifications in my project.

 

Many Thanks..

andreas_lds
Jade | Level 19

"An RPT file is a report or output file created by Crystal Reports" (https://fileinfo.com/extension/rpt)

 

This file-format is even worse than Excel.

Patrick
Opal | Level 21

@don21

In the end of the day your client is the king and if you can help them to shape their data into a form where they can get value out of it then this is your job. This is clearly a business driven project with people in it who don't have too much understanding of data organization. But that's likely why they've engaged with you. You're the expert and you can do it for them.

 

The one thing you probably should do to protect yourself: Write down some sort of interface specification (not IT style but made up business style) where you define the few elements which are essential for your code to work (like sheet name, the name of the "tokens" you use in your code to find the starting point of data, and the name of the column headings) - and have your customer confirm that they can deliver on this. This won't only you, it will also help your customer to understand what they have to pay attention to.

 

The comparable (forecasting) project I've been in: It was ugly from an ETL perspective. Quite a few of the Excel extracts where actually SAP reports and accessing SAP directly would have been the clean way to go. But that was just not possible in the context (SAP owned by HQ, project run by a country agency). ...But then: Once I was able to create a "clean data warehouse" my customer was able to add that much value to the available data (much better forecasts) that the approach went global (big multinational) and then of course things got built much cleaner. I hadn't really been involved in this stage anymore but still: What started as a big ugly mess ended as something which I consider as one of the successes in my professional career.

 

don21
Quartz | Level 8
Hi Patrick,

Thank you for the insight, I value the same too. I have been trying my best to bring in the best practices. I hope for the best in this case. Thank you.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 2075 views
  • 14 likes
  • 8 in conversation