BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sat_lr
Calcite | Level 5

What is the USP to pitch in when setting up a Non-distributed version of SAS VA. My customer more interested towards seeing the benefits of SAS VA over other competitors in terms of speed, accuracy, file storage and distribution system. I am sure most of you might have come across the same situation many times. As the non-distributed version doesn't support Hadoop, I am looking for key things that can make VA distinguishable among other competitors.

1 ACCEPTED SOLUTION

Accepted Solutions
DavidHenderson
SAS Employee

LinusH is correct. 

For complete clarity...

With both Distributed and Non-Distributed, any data source that is available to SAS can be put directly into LASR. We frequently call that streamed-in, as the data is coming from a remote source, through the SAS workspace server and into LASR.  The result in an in-memory LASR table that was never written to disk.

The other option with both Distributed and Non-Distributed, any data source that is available to SAS can be copied to the LASR server prior to the load.  If this were a distributed LASR server, you have the option of using the SASHDAT library engine to distribute the data across the cluster.  The analogous on a non-distrubuted LASR server is to pull the data from your remote data source and write it into a SAS dataset on the LASR server.  (SAS is always installed on that server as well, so this is no problem.)  Once the data is local (whether on the Distributed system in HDFS or on the Non-Distributed system's disk as a SAS dataset) the data is loaded into LASR memory.

The resulting table in LASR is the same and once loaded, LASR is capable of performing the same operations against it.

View solution in original post

11 REPLIES 11
LinusH
Tourmaline | Level 20

Since the infrastructure part of distributed server environment is left out, it leaves only the in-memory server. Still good performance, but not necessary a USP - you could optimize traditional DB/cubes for fast processing as well.

So, I would say the UI (functionality, look and feel) is the main USP here. How unique it is compared to the competition I can't say, but a guess is that the more advanced analysis/statistic stuff stands out.

Data never sleeps
sat_lr
Calcite | Level 5

Surely it is...

Is that in Non-distributed (non co-located data provider) version of SAS VA, data files are stored in Hadoop but HDFS & MapReduce features are not available?

In contrast all these features are available in Distributed (co-located data provider) version of VA?

LinusH
Tourmaline | Level 20

No, HDFS is not available here (unless you create you own separate Hadoop install and make it available to VA using SAS/ACCESS to Hadoop).

MapReduce is not used by VA to my knowledge - HDFS is just used to stream and bulk load data to the LASR server.

Data never sleeps
sat_lr
Calcite | Level 5

ok so in this case (as a No-Colocated Data Provider) in no HDFS version of VA, how do files get stored in-memory? what format? what kind of DB structure?

LinusH
Tourmaline | Level 20

I'm enrolling my first hands on project in a couple of weeks, and with a non-distributed edition, and will be able to answer these kind of questions more precisely then.

But, I think that you read from any available data source via a SAS libname engine.

Data never sleeps
DavidHenderson
SAS Employee

LinusH is correct. 

For complete clarity...

With both Distributed and Non-Distributed, any data source that is available to SAS can be put directly into LASR. We frequently call that streamed-in, as the data is coming from a remote source, through the SAS workspace server and into LASR.  The result in an in-memory LASR table that was never written to disk.

The other option with both Distributed and Non-Distributed, any data source that is available to SAS can be copied to the LASR server prior to the load.  If this were a distributed LASR server, you have the option of using the SASHDAT library engine to distribute the data across the cluster.  The analogous on a non-distrubuted LASR server is to pull the data from your remote data source and write it into a SAS dataset on the LASR server.  (SAS is always installed on that server as well, so this is no problem.)  Once the data is local (whether on the Distributed system in HDFS or on the Non-Distributed system's disk as a SAS dataset) the data is loaded into LASR memory.

The resulting table in LASR is the same and once loaded, LASR is capable of performing the same operations against it.

sat_lr
Calcite | Level 5

Hi David,

Thanks for your brief reply. much useful. In distributed environment, using SASHDAT library, data can be distributed across the cluster. I gone through some of the documentations in hadoop website and ofcourse sas website. I could see the data is replicated (nodes) and stored which actually helps in many ways such as in load balancing, optimizing query process time etc etc (too many to mention here). Today every small analytics vendors to big vendors are moving towards in-memory based architecture. When I say non-distributed, is there any key benefits the customer would get? I am looking here for key differences that I can spot between sas and other in-memory based analytics vendors.

satlr

DavidHenderson
SAS Employee

@sat_lr, as an R&D person, my expertise is in the technical details and implementation of our product-- unfortunately, I am not that familiar with what our competitors offer.  I have sent an email to a group of people who will be able to provide more information.  Stay tuned for more information.

sat_lr
Calcite | Level 5

Thanks David...appreciate it!

LinusH
Tourmaline | Level 20

Just want it even more clear...

When using non-streamed-in in a non-distributed environment, you are mention SAS data sets. Are those standard Base engine data sets? Can you use any other engines here? Consider that we have SPDE/SPD Server installed on this node and it would make sense to use their multi-threaded I/O capabilities. The same could be true for other SAS/ACCESS engines as well.

Data never sleeps
DavidHenderson
SAS Employee

Yes, I am talking about standard SAS datasets accessed via the base engine.

I discourage you from putting SPDE/SPD Server on the same machine as LASR.  Those two are likely to compete heavily for resources, so you are likely to have performance issues if placed on the same machine.  As for other SAS/ACCESS engines... again, I would recommend that the data sources are NOT placed on the same machines. 

Of course you will likely still need to use the SAS/ACCESS engines to pull the data to the SMP server.  Once that is done, the choice is now yours...  As I said, you can place it into datasets on that machine, or stream it directly into memory.  Streaming it directly into memory clearly is faster than writing it to disk, just to read it back and put it into LASR memory.  But if pulling data from the remote source is slow for whatever reason, it is sometimes good to have a local copy.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Tips for filtering data sources in SAS Visual Analytics

See how to use one filter for multiple data sources by mapping your data from SAS’ Alexandria McCall.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 2911 views
  • 11 likes
  • 3 in conversation