An off-topic spot to chat about your musings of the day

Best Practices for Extracting and Storing Data in the Cloud by Murali Sastry

Occasional Contributor
Posts: 16

Best Practices for Extracting and Storing Data in the Cloud by Murali Sastry

As part of a class assignment towards analytics program, studying best practices for cloud computing was a requirement. As part of this study, there were a few nuggets that I had come across and thought to share with the sas community.


Per Zimmerman (2015), the following are the best practices for extracting and storing data in the cloud:


Develop a Plan to protect organizational data:

Using a cross-functional approach with the internal subject matter experts (SME) and stakeholders in the organization, create a detailed plan to manage data in the cloud especially it is highly beneficial to execute this step if the organization recently migrated from traditional in-house data center with servers on company premises. Ensure the plan includes activities and instructions for data access, extraction, and storage with accountability assigned within the organization-who does what, how often data needs to be accessed, extracted, stored in the cloud, back up media, frequency for backup, ensure back up works, and most importantly game plan when internet connection is spotty. A clear accountability plan e.g., Gantt Chart with the activities, responsibilities, and liaison with the cloud broker or source with a robust service level agreement, in-depth cost-risk-benefit analysis for what data needs to be in the cloud vs. the data that needs to stay in-house.


Handle Hardware with Caution:

Storage media, hard drives, routers, etc. need to be handled with care. These could be temporary methods for storage for backup. From step A, the plan should dictate how long files could stay in specific media type as backup and the plan to extract (audio and video files when needed, social media information, transactional data etc.) and store. Verify and ensure that the backup works per design intent.


Redundancy Plan for Backup:

Data storage options now-a-days are plentiful and inexpensive. Have a redundancy to back up the data that is in the cloud. Take additional care for sensitive data if it is in the cloud such as personnel or employee data, customer order data, supplier contracts etc. Ensure to have a solid dialog with the stakeholders on the sensitive data options to store in-house especially if the organization has regulatory and statutory compliance requirements such as in health care, defense etc. Also there is an option to have this kind of data in private clouds.


Legal Requirements Verification and Access Options:

To cater to various customer, legal, regulatory and statutory requirements, verify employee, customer, and supplier access to data that is in the cloud. Grant access and privileges according to employee, customer, supplier need-to-know basis from organizational perspective. In healthcare, banking, legal, and other industries, there might be legal suit situations with clients if there is a probability that the access was given to employees, customers, or suppliers that was not defined in the standard, customer order, or a standard operating procedure or guideline. It is critical to follow the above guidelines carefully in some sensitive industries.


One Location for Data Storage for Simplicity:

It would be much more streamlined and easier to manage if data is in one location or in one format such as SAS datasets created from disparate data sources such as transactional data, audio and video data, social media data etc. Having data in one format helps in keeping the data current in one place and would be much easier to manage from organizational perspective.


Analyze Storage and Extraction Metrics:

Similar to other organizational performance review metrics, manage the extraction and storage metrics in cloud-in terms of employee productivity, quality, ease of use, ease of access, extraction, and storage. Set clear expectations with cloud services broker and have them manage dashboards based on service level agreements to hold them accountable for holding up to their end of the bargain.


Plan to recover lost data:

It is best to have a navigation tool on metadata, types of data that exists on cloud, contingency plan in case data in private or public data is lost or compromised. Ensure the Responsible, Accountable, Consult, and Inform (RACI) matrix is created between the organization and cloud services broker or source and honor the agreements. Ensure a solid plan is created to recover lost data or in case of data breach and communication plan to customers or employees depending on the impact and types of files affected.



Zimmerman, David (2015) Best Practices for Data Protection and Recovery in the Cloud [Blog Post]

Ask a Question
Discussion stats
  • 0 replies
  • 1 in conversation