04-11-2018 11:10 PM
04-12-2018 01:33 PM
04-13-2018 02:09 AM
Can you please explain below line ? Is it a storage that we can save data in it ? I though this is a component.
"there are sites that manages hugs amount of data in SPDS successfully."
It is a SAS component that deals primarily with storage:
Scalable Performance Data Server
It is usually set up with its own pool of physically separate disks to enhance parallel I/O. In times of heavily virtualized SAN storage, it has lost much of its usefulness, at least IMO.
04-14-2018 10:07 AM
Comparing GRID environment vs. SPDS, is like comparing Apples & Oranges! You'll be much better off with both working together :-)
- GRID environment (Be It SAS or Hadoop) typically provides
- A shared, centrally managed analytic computing environment
- High availability
- Workload management to optimally process multiple applications and workloads to maximize overall throughput.
- Flexibility to incrementally grow the computing infrastructure as the number of users and the size of data increase over time
- The ability to do rolling maintenance and upgrades without any disruption to the user community
- Scalable Performance Data Server (SPD Server)
- A client/server, multi-user data server designed to optimize storage and to speed the processing of large SAS data sets.
- Parallelizes many of the SAS I/O functions such as WHERE processing and INDEX creation over multiple data partitions
- Extends parallel capabilities to include GROUP BY processing and SQL passthru.
- Requires an SMP machine and is designed to use all resources available on the machine to achieve maximum scalability.
- The maximum benefit with SPD Server is gained when it is run on a machine with:
multiple I/O channels
large amount of data to be partitioned
- Provides a high performance data store of very large SAS data sets. Therefore, it is particularly suited as part of a data warehousing solution where the SAS system is being used to construct, manage and analyze enterprise-wide datamarts.
So to conclude:
Grid --> More Compute, Memory and local storage resources
SPDS --> Better way to handle and process LARGE SAS Tables stored on file systems
Hope this clarifies your understanding and perceptions of these two technologies and how they could complement each other rather than replace.
04-16-2018 04:32 PM
04-16-2018 09:31 PM
I look at SAS Data sets sizes their file size (GB), rather than record count (Ks, Ms).
I would consider Big Data Set is any thing larger than 7GB, some cases, larger than 5GB. it's all depends on your storage and Network throughput.
If you have very wide data set with long textual columns, it does not need millions of records to accumulate to 5+ GB in storage, if you see what I mean.
04-15-2018 05:04 AM - edited 04-15-2018 05:07 AM
I subscribe to the comment provided by @AhmedAl_Attar. SPDS is a mature product, but not legacy at all, but actually very recommended on many environments.
One of my customers, actually, is using SPDE tables for managing big loads of data, on a GRID environment. However, this model is already very small, and it has been recently advised by a SAS top-notch employee, to move to SPDS. I think @AhmedAl_Attar explains quite well the reasoning behind.
So, your question is about differences between HA (High Availability) and Performance. That is why they are 2 different stories, hard to compare to each other.