SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Quality Control

Reply
New Contributor
Posts: 3

Quality Control

Dear All,

     I need your support and recommendation for my following case,

I'm currently involved in a project which I should apply quality controls over the DWH models. I have the opportunity to choose between DataFlux and DI but I'm not sure which tool will help me in the scenarios I want to apply.

Here are two of the scenarios I want to apply:

- Rule Validation

In this scenario we want to check some rules in the usage stream and draw a graph that describe the rule violation percentage against the whole stream and send this graph by mail. For example if stream contains 100 record and 25 records are violating the rule a graph should be built describing that there’s 25% rule violation.

- Trend Analysis

In this scenario we want to check trend behavior in the last 10 days and if there's any spike in the trend positive or negative a graph should be built and sent by mail showing the last 10 day data and the spike position.

Sorry for the long question but I really need your help.

Best Regards,

Mohamed

Super User
Posts: 5,434

Re: Quality Control

Posted in reply to MohamedSaad

Need some more info on your current situation.

Do you have a data model - and is it already being populated by ETL-jobs?

If yes, what tool is being used for that?

How do you plan to do the analysis - on th fly, or on already loaded data?

What kind of rules do you have?

DataFlux and DI are similar in some cases, but have their main focus on different things.

DI is used for ETL development, and can handle simple controls such as "present in lookup table", value within a range etc.

DataFlux have more sophisticated tools more matching strings that follow patterns, data quality management/work-flow etc, but may not be as strong as an ETL tool.

Data never sleeps
New Contributor
Posts: 3

Re: Quality Control

Yes we have data model it’s already being populated by IBM DataStage,

I plan to do the analysis on already loaded data after the ETL process

Regarding the rules, let us take an example there's the usage stream which contains the CDR's with it attributes like charged amount, free amount and local or roaming. One of the rules is to check if any CRD has the free amount > rated amount. If this rules has been violated by 10% a pie chart should be drown showing that 10% of the total usage is violating the free charge rule.

Super User
Posts: 5,434

Re: Quality Control

Posted in reply to MohamedSaad

Seems odd to have an ETL toll for loading the data, and another for anlayzing the already loaded data.

Nevertheless, this sounds like standard reporting, and I'm not sure if either DIS or DataFlux is the appropriate tool. Maybe using Enterprise Guide could suite your needs?

Data never sleeps
New Contributor
Posts: 3

Re: Quality Control

So in you opinion what's the best architecture solution for this situation in general?

And why the DataFlux is not useful although it's a quality tool?

Super User
Posts: 5,434

Re: Quality Control

Posted in reply to MohamedSaad

I've just worked a little with df. But it focus on finding and correcting data on the fly from sources. And that is on a record/column level.

It seems from your example that you are more interested in reporting on a summarized level. But, please discuss this with a SAS sales representative in more detail, it's hard for me to give you a bullet proof recommendation from this sparse information.

But as I said before, maybe creating some reports in EG (which could be scheduled), or crate summary tables that you could do dynamic reporting on, using standard BI tools (from the BI Server suite for instance), could be sufficient for you needs.

Data never sleeps
Super User
Posts: 19,851

Re: Quality Control

Posted in reply to MohamedSaad

I agree with Linus, it looks like reporting on the status of a DWH and that would be done in a reporting tool, eg. EG or BASE SAS could work, but SQL could probably do most of that as well.

Occasional Contributor
Posts: 14

Re: Quality Control

Posted in reply to MohamedSaad

as I see usig DI is the best for your case, as you can even create the reports after applying the rules. as Linus and Reeza recommeded Enterprise Guide is good but your data is huge as it is more than 1,000,000,000 so using good techniques and DI Features you can apply ur rules

Super User
Posts: 19,851

Re: Quality Control

Posted in reply to MS_Egyptian

Can DI send emails?

Super User
Posts: 5,434

Re: Quality Control

If you mean the DI Studio client, then no.

But in DI Studio you can build routines/jobs that can send e-mails, which rely on existing email functionality within Base SAS.

Data never sleeps
Occasional Contributor
Posts: 14

Re: Quality Control

Yes, DI  can send emails.

Super User
Posts: 5,434

Re: Quality Control

Posted in reply to MS_Egyptian

Number of records makes no difference. Both DI and EG use base SAS programming. The difference is the functionality within the user Interface, their intended use, and type of integration with metadata.

Data never sleeps
Occasional Contributor
Posts: 14

Re: Quality Control

Thank you LinusH, you are right I know Mr. Mohamed Environment that's why I said that he can use DI Studio.

Respected Advisor
Posts: 4,173

Re: Quality Control

Posted in reply to MohamedSaad

To also add my 5 cents to your question:

DF is a great tool but more in the area of data cleansing and standardisation (eg. when dealing with addresses and names).

From what you describe I would probabely use EG for data exploration. I would use DIS when it comes to operationalising things - so stuff you want to run on a regular bases in batch and with a scheduler as part of a bigger work-flow.

Both EG and DIS create in the end SAS code. So you can easily start with EG for prototyping. Once you really know what you need you re-build it in DIS in a more stable, cleaner, performance optimised version.

Ask a Question
Discussion stats
  • 13 replies
  • 571 views
  • 0 likes
  • 5 in conversation