Solved: BIg Data and traditional databases - meaning(s) of schema

odesh · Posted 08-02-2019 05:20 AM

Hello,

I am not sure about the meaning of schema in the context of Hadoop. The word schema has a well-defined meaning in a traditional DBMS ( Oracle, SQL Server etc. ). PLease see attachment - in this course it seems that schema leads to rigidity which is bad. My question is " How can we remove the rigidity but still preserve some form of hierarchy and control ? Is this not necessary ?"

Thanks.

Odesh.

Cynthia_sas · Posted 08-02-2019 09:52 AM

Hi: Here's the response from the course instructors:

The intent of the item on the slide is that data can be ingested into Hadoop as-is, without a predefined schema, such as a STAR or SNOWFLAKE schema, found in traditional relational databases. Hadoop is schema on READ .. whereas traditional databases are schema on WRITE. Schemas are good .. but have restrictions when trying to work with big data, which 90% of is considered unstructured, like IoT (internet of Things) or Social media data.

In traditional databases with schemas, to add data into an existing schema, the RDBMS Data Base Administrator (DBA) and IT department had to analyze the data and determine where best to connect the data. This delay, from working with previous customers, could take weeks or months. The rigid process definitely incurs a certain amount of latency in Service Level Agreements.

With Hadoop, you can add any data and define the table schema to read the data later. In fact, you can have several table schemas ( columns,data types, sizes) to read the same data. The schema is determined by the intended use case. With that said, Hadoop is very flexible in data access and storage formats. This flexibility provides a data lake with tremendous capabilities.

Hope this helps clarify your questions,

Cynthia

View solution in original post

Cynthia_sas · Posted 08-02-2019 09:52 AM

Hi: Here's the response from the course instructors:

The intent of the item on the slide is that data can be ingested into Hadoop as-is, without a predefined schema, such as a STAR or SNOWFLAKE schema, found in traditional relational databases. Hadoop is schema on READ .. whereas traditional databases are schema on WRITE. Schemas are good .. but have restrictions when trying to work with big data, which 90% of is considered unstructured, like IoT (internet of Things) or Social media data.

In traditional databases with schemas, to add data into an existing schema, the RDBMS Data Base Administrator (DBA) and IT department had to analyze the data and determine where best to connect the data. This delay, from working with previous customers, could take weeks or months. The rigid process definitely incurs a certain amount of latency in Service Level Agreements.

With Hadoop, you can add any data and define the table schema to read the data later. In fact, you can have several table schemas ( columns,data types, sizes) to read the same data. The schema is determined by the intended use case. With that said, Hadoop is very flexible in data access and storage formats. This flexibility provides a data lake with tremendous capabilities.

Hope this helps clarify your questions,

Cynthia

BIg Data and traditional databases - meaning(s) of schema

Re: BIg Data and traditional databases - meaning(s) of schema

Re: BIg Data and traditional databases - meaning(s) of schema

BIg Data and traditional databases - meaning(s) of schema

Re: BIg Data and traditional databases - meaning(s) of schema

Re: BIg Data and traditional databases - meaning(s) of schema

SAS Training: Just a Click Away