Our client needed to migrate from on-premises SAS Enterprise Case Management 6.3_M1 to SAS Visual Investigator on Viya 3.5 resident on RHEL servers within AWS. It was in service for over 5 years. In that time, the team of fraud investigators produced and stored over 66,000 cases with attachments, containing over 200,000 attachments. All attachments were only associated with cases. One of the biggest challenges was extracting those attachments in preparation for migration to SAS Viya Visual Investigator. Extracting these attachments from SAS 9.4 was assigned to me.
The legacy system consisted of a SAS 9.4_M3 deployment residing upon an on-premises RHEL 7.9 (Maipo) host collection. The SAS Solution was the SAS Fraud Framework. This solution includes SAS Enterprise Case Management. This tool provided an interface for analysts to analyze, collect, and augment data surrounding suspected fraud or violations incidents. It also allowed managers to assign cases within a workflow and, in general, disposition the cases as required. One feature of the system allowed the linking of supporting documents as attachments to a given case. These attachments were stored within the SAS Content Server. The SAS Content Server provided WebDAV-based protocols for accessing these attachments. The WebDAV protocols extended the HTTP protocol by adding write access, version control, search, etc. Hence, access to the stored attachments within the SAS Content Server can be leveraged by URL reference.
Figure 1 SAS 9.4 Maintenance 3 Enterprise Case Management Logical Environment
I noted that our system was set up using two services for attachment access via a URL. One was the expected one as determined from the Shared Services database, where the case_rk is the case reference key value found in the database and the parent is the initial set of numbers of the case_rk before the final three digits of the same (i.e., parent is 28 if case_rk is 28123).
https://<your mid-tier machine>:8343/SASContentServer/repository/default/sasdav/Products/SASEntCaseManagement/Common/Attachments/Case/0/<parent>/<case_rk>
Figure 2 Illustration of Shared Services Database Attachment Folder Structure Within Browser
The other strung the cases right from the ‘sasdav’ directory:
https://<your mid-tier machine>:8343/SASContentServer/repository/default/sasdav/0_<case_rk>
Figure 3 Illustration of Enterprise Case Management Database Attachment Folder Structure Within Browser
A browser allows one to download attachments directly from these endpoints. The cURL command provides the same access for a batched script. However, string manipulation, while possible within bash shell-scripting and much improved from earlier iterations, is still not ideal within there. SAS provides powerful string manipulation techniques and an interface to the operating system when that is more appropriate. I settled upon SAS as the main driver with bash scripts called from SAS to augment in key areas such as the build-out of directories and cleanly executing the cURL command.
Examining the endpoint URL’s above, one just needs to make a call to the specific case directory (such as 123028 or 0_74270) to extract a response file that contains the full URL paths to each of the attachments found therein. Each of these fully qualified URL’s begin with ‘https’. One could, in theory, make a call to the parent directory of all the cases, obtain the response file containing the list of cases, then set up an iterative process to obtain the response file of each case, but I decided against this and relied on the available databases to provide the target set of cases. Also, this would force one to also go through all cases not having attachments, a very fruitless waste of computing time.
I created two sets of programs, one for the Shared Services database, the other for the ECM database. Both the list creation and attachment extraction programs are quite similar, but I found it to be beneficial to have the separate pairs under our time restraints.
You should ensure that the xcmd option is active in your SAS 9.4 environment. The following code does this for you:
The log output will contain the following if it is active, as shown here:
NOTE: Writing HTML5(EGHTML) Body file: EGHTML
27
28 proc options option=xcmd; run;
SAS (r) Proprietary Software Release 9.4 TS1M7
XCMD Enables the X command in SAS.
If you encounter ‘NOXCMD’, it is not activated, and you need to ask your administrator to enable it:
Solved: How to enable XCMD on SAS Linux - SAS Support Communities
You should know how to encode a password in SAS. The following code does this for you:
proc pwencode method=sas004 in='<your password goes here>'; run;
The log output will contain the encoded password, as shown here:
NOTE: Writing HTML5(EGHTML) Body file: EGHTML
27
28 proc pwencode method=sas004 in=XXXXXXXXXX; run;
{SAS004}FC1C266B5A679262983A65A8845432543884C76BF05F9C14
NOTE: PROCEDURE PWENCODE used (Total process time):
real time 0.33 seconds
cpu time 0.07 seconds
Within the list creation programs, the extract from the ECM database follows:
proc sql;
create table attach_cases as
select a.case_id, b.*
from ecm_db.case_live a
inner join ecm_db.ecm_event b on a.case_rk = b.business_object_rk
inner join ecm_db.case_x_user_group c on b.business_object_rk = c.case_rk
where
a.case_type_cd = 'ENUM'
and c.user_group_nm <>'P43708P'
and b.event_type_cd in ('ADDATT', 'DELATT')
and business_object_nm = 'CASE'
order by a.case_id
;
quit;
Your exact query’s WHERE clause will vary, as the user_group_nm is used to filter out test cases as defined in our env and you may or may not want to break out by case_type_cd, depending on the number of cases you have. The event_type_cd is ensuring that only attachment additions or attachment deletions are being captured. Output will be like this:
A different approach was used to get the proper records from the shared services database. First, a separate sql program was created on the system (grab-attchs.sql) containing the following:
The ‘WHERE OBJECT_TYPE_ID=545000’ ensures that we capture cases. The ATTACHMENT_PATH provides the relative URL path beginning with ‘Products’ and ending with the name of the attachment. Two example observations from the csv file follow:
Products/SASEntCaseManagement/Common/Attachments/Case/0/155/155974/012345678-Consent form for.tif
Products/SASEntCaseManagement/Common/Attachments/Case/0/155/155974/012345678-Other type of do.tif
A SAS call to the OS via x-command calling psql executes the sql code and the resulting csv file is retrieved into the SAS program via a filename and data step.
The resulting data set is called caseout. Once within SAS, subsequent processing extracts the <parent> as prnt, the case reference key as case_rk, and the attachment file name as attch. These are the last 3 portions of the strings above. Code is here. It basically shatters the string into 9 pieces along the forward-slash delimiter and keeps what we need:
data parse0;
length prnt case_rk attch $100.;
set caseout;
array pieces[9] $ 100 piece1-piece9;
do i = 1 to 9;
pieces[i] = scan(casersp,i,'/');
end;
prnt=piece7; case_rk=piece8; attch=piece9;
drop casersp i piece1-piece9;
run;
Please note that in the program obtaining the list for ECM database, we ensure that no duplicates exist that are also in the SSDB database, whereas in the program for the SSDB, we gather additional information from the ECM database for any SSDB entries found therein.
Once the list is gathered in either version of the code, we turn to the attachment extraction program. This is the true engine of the setup. We’ll illustrate with the Shared Services database version. The start of the process is in this data step:
%let list=list_v3;
data _null_;
length str $1024.;
set mylib.&list.;
str='%nrstr(%curl_loop('||left(trim(case_rk))||',%bquote('||left(trim(prnt))||'),'||left(trim(case_id))||'));';
call execute (str);
run;
In the data step loop, a string is constructed using the case_rk , prnt, and case_id values for a given case observation. The string is passed to a call execute statement which executes a macro named %curl_loop. The %curl_loop macro takes the case_rk and prnt values and uses them in an x command that executes a shell script called ‘get_case_v3.sh’. Besides some variable setup and potential file cleanup, the main function of the bash script is to use the cURL command to create a response file called $case.txt (where $case=<case_rk>.
In ‘get_case_v3.sh', the cURL command with the setup is as follows:
The response file output within $case.txt is as follows. The 3 attachment references are preceded by 'href=' (Ignore the last 'href='):
<html><head><title>Jackrabbit 2.4.0 /default/sasdav/Products/SASEntCaseManagement/Common/Attachments/Case/0/517/517471</title></head><body><h2>/default/sasdav/Products/SASEntCaseManagement/Common/Attachments/Case/0/517/517471</h2><ul><li><a href="..">..</a></li><li><a href="https://<your midtier machine>:8343/SASContentServer/repository/default/sasdav/Products/SASEntCaseManagement/Common/Attachments/Case/0/517/517471/2020-517472_ZZ%20Other%20Doc%20(1).pdf">2020-517472_ZZ Other Doc (1).pdf</a></li><li><a href="https://<your midtier machine>:8343/SASContentServer/repository/default/sasdav/Products/SASEntCaseManagement/Common/Attachments/Case/0/517/517471/2020-517472_ZZ%20Other%20Doc%20(2).pdf">2020-517472_ZZ Other Doc (2).pdf</a></li><li><a href="https://<your midtier machine>:8343/SASContentServer/repository/default/sasdav/Products/SASEntCaseManagement/Common/Attachments/Case/0/517/517471/2020-517472_Supporting%20Documents.docx">2020-517472_Supporting Documents.docx</a></li></ul><hr size="1"><em>Powered by <a href="http://jackrabbit.apache.org/">Jackrabbit</a> version 2.4.0</em></body></html>200
Returning to the %curl_loop macro, the response file ($case.txt above) contents are pulled into SAS as a string and a count is performed of all potential substrings beginning with ‘https’ (this assumes the https protocol was used; if not, you’ll need to do some additional coding to control-break on the string ‘jackrabbit.apache.org’).
This counting of attachments beforehand allows us to decide in the code-flow whether to create a directory or not. The bash scripting for the directory creation is shown below. This is in the 'mkdir.sh' within the attached WinZip file:
As you can see above, you’ll need to define an Attachment Landing Zone location on your server for the mkdir.sh script and in the SAS code as a parent directory for the generated case attachment directory. Here’s a peek at what it’ll look like after running through some cases with attachments:
Figure 4 Set of Case Attachment Directories in Landing Zone
Assuming that we have attachments and did not get a zero for the count, the same response-file string used to get the above-zero count is again set in a data step. The data step is as below, but with additional comments to explain the string manipulations:
data endpt1;
length parse_me $4096. str $1024. ;
set caseout;
parse_me=casersp;
array url {&cnt.} $1024.;
array attch {&cnt.} $512.;
array otfl {&cnt.} $512.;
do i=1 to &cnt.;
/* take raw string, locate beginning of the 'https' portion,
then substring at start of 'https'
*/
bgn=index(parse_me, 'https');
parse_me=substr(parse_me, bgn);
/* now locate the ending double-quote, remove it and all following it
to create the fully-qualified url/attachment name, then place that into the url array
*/
end_=index(parse_me, '"');
url{i} = substr(parse_me, 1, end_-1);
/* find start of the attachment name by finding the position of the last slash in reverse ('b'=backwards in findc function)
then add one to that position to move ahead of the '/', place that attachment name into the array
*/
strta=findc(url{i},'/','b');
attch{i}=substr(url{i},strta+1);
/* need to replace the '%20's with underscores within the attch{i} entries
and use new value w/the '-o' flag in curl command following
*/
otfl{i}=prxchange('s/%20/_/', -1, left(trim(attch{i})));
/* CANNOT PLACE A '(' SIGN INTO THE TARGET FILE NAME */
otfl{i}=prxchange('s/\(/_/', -1, left(trim(otfl{i})));
/* CANNOT PLACE A ')' SIGN INTO THE TARGET FILE NAME */
otfl{i}=prxchange('s/\)/_/', -1, left(trim(otfl{i})));
/* CANNOT PLACE A '#' SIGN INTO THE TARGET FILE NAME, HAS WILDCARD MEANING IN cURL COMMAND!!!!! */
otfl{i}=prxchange('s/\#/_/', -1, left(trim(otfl{i})));
/* REMOVING SEVERAL CASES OF UNWANTED CHARACTERS WITHIN FILENAME */
otfl{i}=prxchange('s/%23/_/', -1, left(trim(otfl{i})));
otfl{i}=prxchange('s/%26/_/', -1, left(trim(otfl{i})));
otfl{i}=prxchange('s/%2c/_/', -1, left(trim(otfl{i})));
otfl{i}=prxchange('s/%5b/_/', -1, left(trim(otfl{i})));
otfl{i}=prxchange('s/%5d/_/', -1, left(trim(otfl{i})));
otfl{i}=prxchange('s/%e2/_/', -1, left(trim(otfl{i})));
otfl{i}=prxchange('s/%80/_/', -1, left(trim(otfl{i})));
otfl{i}=prxchange('s/%8e/_/', -1, left(trim(otfl{i})));
otfl{i}=prxchange('s/%99/_/', -1, left(trim(otfl{i})));
otfl{i}=prxchange('s/%93/_/', -1, left(trim(otfl{i})));
otfl{i}=prxchange('s/%3d/_/', -1, left(trim(otfl{i})));
/* Construct macro-call string to feed to 'call execute',
remember that the 'glued' macro variable is jsut the case-id without the dash '-'
*/
str='%call_bash('||&case_rk.||','||&prnt.||',%bquote('||left(trim(otfl{i}))||'),'||&glued.||',%bquote('||left(trim(attch{i}))||'));';
call execute (str);
/* advance upon string to seek next 'https' and remember that 'end' is where the closing
double-quote is positionally within the url, so we move the position immediately following it
*/
parse_me=substr(parse_me, end_+1);
end;
* put _all_;
run;
Three arrays are setup for the url beginning with ‘https’, the attachment name, and the output file name (otfl) generated for the attachment.
A do-loop is entered, iterating for each occurrence of an attachment via some string manipulation (explained in comments above). Various html encodings are removed and replaced for various reasons within the attachment name to produce the output file name.
Finally, for each attachment, a string is formed pulling in the attachment name, the output file name, the case reference key, the case-id (without a dash), and the parent. This string is then used in a call execute statement to pass to the %call_bash macro.
The %call_bash macro executes the x command calling the cURL command that extracts the attachment and lands it in the proper case directory. %call_bash with its supporting %let statements is as below:
/* Attachment Landing Zone Location on Server.*/
%let ATTCH_LZ=/sas/Work/attch_typed_v3/attchmnts;
%let uidpwd=sasadm@saspw:<your encoded password>;
/*root of full url*/
%let rurl=https://<your mid-tier machine>:8343/SASContentServer/repository/default/sasdav/Products/SASEntCaseManagement/Common/Attachments/Case/0;
/* THE FOLLOWING %let statement WAS NECESSARY IN OUR ENV, MAY NOT BE IN YOURS
If not necessary in your environment, note that you MUST remove the "&lib64" command (and its terminating semicolon) from the 'x' command below*/
%let lib64=export LD_LIBRARY_PATH=/lib64:$LD_LIBRARY_PATH;
/* flags used to make curl more scriptable*/
%let crlopt=%str(--fail --connect-timeout 3 --retry 0 -s -k );
%macro call_bash(crk, prnt, otfl, glued, attch);
x "&lib64.; curl %bquote(&crlopt.) ""&rurl./&prnt./&crk/&attch."" -u %bquote(&uidpwd.) -o ""&ATTCH_LZ./&glued./&otfl"" ";
%mend;
Here's a picture of attachments landed within a given case directory from 3 separate curl commands, as called above. The case in question had a count of three entering the do loop:
Figure 5 Sample of Attachments Under a Case Directory
Back in the data step, the next ‘https’ is discovered by advancing along the string so the next attachment can be processed in like manner within the DO-loop. When the DO loop iterates past the discovered count of ‘https’ instances within the response file string, it ends.
Finally, after the data step is exited, an OPTIONAL x command is called to execute the rsync command to move the attachments to a directory that is named after the case-id with a dash. This was a special request after the fact and will depend on how you named your cases. Your environment will drive how you need to customize any provided code-base.
A couple of points to keep in mind:
Please reach out to tom.gaughan@sas.com for any questions you have.
Although the loading of the attachments to SAS Viya Visual Investigator is outside the scope of this discussion, a quick rundown of the environment, a brief description of the load to Visual Investigator and the contact to the developer are provided here.
The target system is a SAS Viya 3.5 deployment residing upon Amazon Web Services within a RHEL 8.6 (Oopta) host collection. SAS Visual Investigator on Viya is the successor to the SAS Fraud Framework. It provides the same functionality in terms of providing an interface for analysts to analyze, collect, and augment data surrounding suspected fraud or violations incidents. It also allows managers to assign cases within a workflow and, in general, disposition the cases as required. As is expected within such a system, linking of supporting documents as attachments to a given case is a feature. These attachments are stored within the SAS Infrastructure Data Server. In the case of a migration from a legacy system, a developer can leverage REST API’s to bulk load attachments to the system.
Tejas Kharva developed the VI load process for our migration. The brief description he provided is as follows:
GET call to https://{{host}}/folders/folders/@item?path=/visual-investigator
POST call to https://{{host}}/files/files?parentFolderUri=/folders/folders/<folder_hash>
POST call to https://{{host}}/svi-datahub/documents/<object_name>/<object_id>/attachments
Figure 6 SAS Viya 3.5 Visual Investigator Environment
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.