BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
odesh
Quartz | Level 8

Hello,

I am not sure about the meaning of schema in the context of Hadoop. The word schema has a well-defined meaning in a traditional DBMS  ( Oracle, SQL Server etc. ). PLease see attachment - in this course it seems that schema leads to rigidity which is bad.  My question is " How can we remove the rigidity but still preserve some form of hierarchy and control ?  Is this not necessary ?"

 

Thanks.

Odesh.

1 ACCEPTED SOLUTION

Accepted Solutions
Cynthia_sas
SAS Super FREQ

Hi: Here's the response from the course instructors:

 

The intent of the item on the slide is that data can be ingested into Hadoop as-is, without a predefined schema, such as a STAR or SNOWFLAKE schema, found in traditional relational databases.   Hadoop is schema on READ .. whereas traditional databases are schema on WRITE.    Schemas are good .. but have restrictions when trying to work with big data, which 90% of is considered unstructured, like IoT (internet of Things) or Social media data.

 

In traditional databases with schemas,  to add data into an existing schema, the RDBMS Data Base Administrator (DBA) and IT department had to analyze the data and determine where best to connect the data.  This delay, from working with previous customers, could take weeks or months.  The rigid process definitely incurs a certain amount of latency in Service Level Agreements.

 

With Hadoop, you can add any data and define the table schema to read the data later.  In fact, you can have several table schemas ( columns,data types, sizes) to read the same data. The schema is determined by the intended use case.  With that said, Hadoop is very flexible in data access and storage formats.  This flexibility provides a data lake with tremendous capabilities. 

 

Hope this helps clarify your questions,

Cynthia

View solution in original post

1 REPLY 1
Cynthia_sas
SAS Super FREQ

Hi: Here's the response from the course instructors:

 

The intent of the item on the slide is that data can be ingested into Hadoop as-is, without a predefined schema, such as a STAR or SNOWFLAKE schema, found in traditional relational databases.   Hadoop is schema on READ .. whereas traditional databases are schema on WRITE.    Schemas are good .. but have restrictions when trying to work with big data, which 90% of is considered unstructured, like IoT (internet of Things) or Social media data.

 

In traditional databases with schemas,  to add data into an existing schema, the RDBMS Data Base Administrator (DBA) and IT department had to analyze the data and determine where best to connect the data.  This delay, from working with previous customers, could take weeks or months.  The rigid process definitely incurs a certain amount of latency in Service Level Agreements.

 

With Hadoop, you can add any data and define the table schema to read the data later.  In fact, you can have several table schemas ( columns,data types, sizes) to read the same data. The schema is determined by the intended use case.  With that said, Hadoop is very flexible in data access and storage formats.  This flexibility provides a data lake with tremendous capabilities. 

 

Hope this helps clarify your questions,

Cynthia

 

This is a knowledge-sharing community for learners in the Academy. Find answers to your questions or post here for a reply.
To ensure your success, use these getting-started resources:

Estimating Your Study Time
Reserving Software Lab Time
Most Commonly Asked Questions
Troubleshooting Your SAS-Hadoop Training Environment

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 544 views
  • 1 like
  • 2 in conversation