You're working late, responding to a request for information (RFI). Having just answered 42,893 questions (that's what it feels like, right?) and you look to the next one on the list:
RFI Question #42894: Does the data storage solution support binary large objects (Blobs)?
Short answer: | Yes |
Long answer: | Keep reading... |
Though "BLOB" is common parlance, the actual CAS data type for large binary data is "VARBINARY." Theoretically the VARBINARY data type could hold any type of "file" data, e.g. an audio file, an image, a pdf, an archive file, an MP4, whatever. However, in Viya 3.4, you'll only be able to get certain kinds data into it.
With the 3.3 release, Viya introduced a set of image processing capabilities. Included was the loadimages CAS action. This action loads image files (e.g. jpg, png, dicom) from path CASLibs. So you'll need to transfer or mount your image files to the CAS controller to get them into CAS.
Using the loadImages CAS action looks like this:
proc cas;
image.loadImages / caslib=”CASUSER”
path=”list.txt”
decode=TRUE
pathIsList=TRUE
casout={caslib=”CASUSER” name=”imageTable”, replication=0, replace=true};
run;
quit;
The PATH parameter can point to a file that lists the images (as shown) or it can be left blank and CAS will load any image files it finds in the CASLib DataSource location.
Once loaded, the image blobs are placed into the _image_ field inside the target CAS table:
With the 3.4 release, Viya introduced a set of audio processing capabilities. Like the imaging functionality, this audio package contains its own load CAS action, loadAudio. Again the action uses only path CASlibs. So, like with image files, you'll need to transfer or mount your files to the CAS controller to get them into CAS.
Using the loadAudio CAS action looks similar to the loadImages action:
proc cas;
audio.loadAudio / caslib=”CASUSER”
path=”list.txt”
casout={caslib=”CASUSER” name=”audioTable”, replication=0, replace=true};
run;
quit;
Once loaded, the audio blobs are placed into a VARBINARY field inside the target CAS table like with images.
What about other file types, e.g. PDFs, xls, doc, ...? Can we load those?
Here, again, the short answer is yes but with some reservations.
Again, the longer answer is below.
In Viya 3.4, the loadTable CAS Action can load complex text document formats like PDFs, Word docs, and PPTs when used with the FileType="DOCUMENT" option. (This manifests in the User Interface as the Documents Directory Import)
You might think these would come in as BLOBs (VARBINARY fields). However, they are actually converted to text and stored as VARCHAR. This is great for text analytics but if your goal is to keep the document as is, then this will not help you. As any desktop app will tell you, you lose formatting and document metadata when you convert a document to text.
Also, you can't bring in non-Image and non-Audio files with the respective loadImage or loadAudio CAS actions. These actions simply ignore any file types that don't meet their input requirements.
In Viya 3.4, no connector supports BLOBs -- images, audio, or otherwise. So if you want to bring in audio or image files from a database, you'll have to stage them on the CAS controller file system and load them using either the loadImage or loadAudio CAS actions.
Only a limited set of CAS actions support VARBINARY columns in Viya 3.4 -- essentially only the imagining, audio, BioMedImage, as well as the ASTORE action set, and possibly a few more.
The majority of CAS actions, procedures, and DATA Step do not support tables with VARBINARY fields and will error in various ways if they encounter them.
While CAS only offers limited support for BLOBs, Viya 3.4 (as a whole) offers a considerable amount of functionality for binary data. Considering CAS' BLOB capabilities along with Visual Analytics embedded content capabilities as well as CAS' document conversion capabilities, Viya 3.4's binary data support is robust.
Here are just some of the ways, Viya 3.4 can utilize binary (file) data. Many of which have been mentioned already:
So, while you might not be able to load all BLOB files into CAS, between Visual Analytics and CAS, Viya can meet most business requirements around binary data.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.