BookmarkSubscribeRSS Feed
MohamedSaad
Calcite | Level 5

Dear All,

     I need your support and recommendation for my following case,

I'm currently involved in a project which I should apply quality controls over the DWH models. I have the opportunity to choose between DataFlux and DI but I'm not sure which tool will help me in the scenarios I want to apply.

Here are two of the scenarios I want to apply:

- Rule Validation

In this scenario we want to check some rules in the usage stream and draw a graph that describe the rule violation percentage against the whole stream and send this graph by mail. For example if stream contains 100 record and 25 records are violating the rule a graph should be built describing that there’s 25% rule violation.

- Trend Analysis

In this scenario we want to check trend behavior in the last 10 days and if there's any spike in the trend positive or negative a graph should be built and sent by mail showing the last 10 day data and the spike position.

Sorry for the long question but I really need your help.

Best Regards,

Mohamed

13 REPLIES 13
LinusH
Tourmaline | Level 20

Need some more info on your current situation.

Do you have a data model - and is it already being populated by ETL-jobs?

If yes, what tool is being used for that?

How do you plan to do the analysis - on th fly, or on already loaded data?

What kind of rules do you have?

DataFlux and DI are similar in some cases, but have their main focus on different things.

DI is used for ETL development, and can handle simple controls such as "present in lookup table", value within a range etc.

DataFlux have more sophisticated tools more matching strings that follow patterns, data quality management/work-flow etc, but may not be as strong as an ETL tool.

Data never sleeps
MohamedSaad
Calcite | Level 5

Yes we have data model it’s already being populated by IBM DataStage,

I plan to do the analysis on already loaded data after the ETL process

Regarding the rules, let us take an example there's the usage stream which contains the CDR's with it attributes like charged amount, free amount and local or roaming. One of the rules is to check if any CRD has the free amount > rated amount. If this rules has been violated by 10% a pie chart should be drown showing that 10% of the total usage is violating the free charge rule.

LinusH
Tourmaline | Level 20

Seems odd to have an ETL toll for loading the data, and another for anlayzing the already loaded data.

Nevertheless, this sounds like standard reporting, and I'm not sure if either DIS or DataFlux is the appropriate tool. Maybe using Enterprise Guide could suite your needs?

Data never sleeps
MohamedSaad
Calcite | Level 5

So in you opinion what's the best architecture solution for this situation in general?

And why the DataFlux is not useful although it's a quality tool?

LinusH
Tourmaline | Level 20

I've just worked a little with df. But it focus on finding and correcting data on the fly from sources. And that is on a record/column level.

It seems from your example that you are more interested in reporting on a summarized level. But, please discuss this with a SAS sales representative in more detail, it's hard for me to give you a bullet proof recommendation from this sparse information.

But as I said before, maybe creating some reports in EG (which could be scheduled), or crate summary tables that you could do dynamic reporting on, using standard BI tools (from the BI Server suite for instance), could be sufficient for you needs.

Data never sleeps
Reeza
Super User

I agree with Linus, it looks like reporting on the status of a DWH and that would be done in a reporting tool, eg. EG or BASE SAS could work, but SQL could probably do most of that as well.

MS_Egyptian
Fluorite | Level 6

as I see usig DI is the best for your case, as you can even create the reports after applying the rules. as Linus and Reeza recommeded Enterprise Guide is good but your data is huge as it is more than 1,000,000,000 so using good techniques and DI Features you can apply ur rules

Reeza
Super User

Can DI send emails?

LinusH
Tourmaline | Level 20

If you mean the DI Studio client, then no.

But in DI Studio you can build routines/jobs that can send e-mails, which rely on existing email functionality within Base SAS.

Data never sleeps
MS_Egyptian
Fluorite | Level 6

Yes, DI  can send emails.

LinusH
Tourmaline | Level 20

Number of records makes no difference. Both DI and EG use base SAS programming. The difference is the functionality within the user Interface, their intended use, and type of integration with metadata.

Data never sleeps
MS_Egyptian
Fluorite | Level 6

Thank you LinusH, you are right I know Mr. Mohamed Environment that's why I said that he can use DI Studio.

Patrick
Opal | Level 21

To also add my 5 cents to your question:

DF is a great tool but more in the area of data cleansing and standardisation (eg. when dealing with addresses and names).

From what you describe I would probabely use EG for data exploration. I would use DIS when it comes to operationalising things - so stuff you want to run on a regular bases in batch and with a scheduler as part of a bigger work-flow.

Both EG and DIS create in the end SAS code. So you can easily start with EG for prototyping. Once you really know what you need you re-build it in DIS in a more stable, cleaner, performance optimised version.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 1685 views
  • 0 likes
  • 5 in conversation