07-26-2023
boemskats
Lapis Lazuli | Level 10
Member since
04-26-2013
- 164 Posts
- 234 Likes Given
- 17 Solutions
- 130 Likes Received
About
Background in SAS Business Intelligence Consulting. Founder of Boemska Technology Solutions Ltd. (boemskats.com)
Enterprise architecture
Solution Design
Web Programming
-
Latest posts by boemskats
Subject Views Posted 3912 09-18-2020 02:39 PM 3507 02-28-2020 08:32 AM 4285 10-03-2019 10:01 AM 2548 05-10-2019 11:25 AM 1065 04-27-2019 04:14 PM 7124 04-19-2019 10:31 AM 3751 04-16-2019 08:26 AM 2356 04-16-2019 07:53 AM 4059 03-14-2019 08:13 AM 4069 03-14-2019 07:37 AM -
Activity Feed for boemskats
- Got a Like for Re: Anyone interested in developping a VS Code extension for SAS language?. 02-12-2021 10:53 AM
- Liked A Giraffe Taught me Kubernetes! for Ali_Aiello. 10-26-2020 04:44 AM
- Liked How do I use Git with my SAS projects? Q&A and webinar recording for ChrisHemedinger. 09-30-2020 04:07 PM
- Posted Finding the Metadata owner of a Workspace session or SASWork directory on Linux on SAS Communities Library. 09-18-2020 02:39 PM
- Liked Azure Storage for SAS Architects for EdoardoRiva. 09-03-2020 06:27 AM
- Liked Overview of container technology: A quick introduction to all things containers for ShelleySessoms. 08-08-2020 07:50 PM
- Liked Re: Is it possible to limit xcmd to certain users? for ronan. 07-28-2020 03:32 AM
- Liked Connecting from SAS 9.4 to Azure Quickstart for FrederikV. 07-28-2020 02:39 AM
- Liked My first development with VA SDK using Node.js for XavierBizoux. 06-24-2020 10:08 AM
- Liked Directing CAS when to use its cache (or not) for RobCollum. 06-24-2020 10:08 AM
- Liked My favorite Administration papers from SAS Global Forum 2020 for cj_blake. 05-05-2020 11:34 AM
- Posted Re: Sample Script not storing/identifying the correct PID on Administration and Deployment. 02-28-2020 08:32 AM
- Got a Like for Re: SAS VIya 3.4 Preinstallation task is failing. 01-13-2020 01:22 AM
- Liked Re: stdin stdout for Tom. 10-26-2019 08:05 AM
- Got a Like for Re: ERROR SAS Web Infrastructure Platform. 10-16-2019 04:19 PM
- Got a Like for Re: ERROR SAS Web Infrastructure Platform. 10-11-2019 04:54 PM
- Got a Like for Re: Stopping a running scheduled job. 10-03-2019 10:30 PM
- Posted Re: Stopping a running scheduled job on Administration and Deployment. 10-03-2019 10:01 AM
- Liked SUGA webinar on October 8: ESM tool for SAS for ShelleySessoms. 10-03-2019 09:45 AM
- Liked Re: Object Spawner Major (page) Faults in the SAS Environment Manager for SimonDawson. 08-07-2019 01:52 PM
-
Posts I Liked
Subject Likes Author Latest Post 13 9 16 1 1 -
My Liked Posts
Subject Likes Posted 1 11-22-2018 03:16 PM 1 10-03-2019 10:01 AM 5 04-19-2019 10:31 AM 2 04-16-2019 08:26 AM 1 03-14-2019 07:37 AM -
My Library Contributions
Subject Likes Author Latest Post 7
03-10-2018
08:34 AM
Are there SAS processes still running that are using that library / those libraries, stopping them from being updated? Maybe the Metadata Server, or random leftover user sessions or scheduled jobs?
I'd start with lsof or fuser & take it from there.
... View more
03-05-2018
07:56 AM
2 Likes
It's in the SAS docs here: http://documentation.sas.com/?docsetId=omaref&docsetTarget=n1ow0lbqw8ktpwn1e37i0p90jeaz.htm&docsetVersion=9.4&locale=en
Most of the OMI documentation is awesome, there's just a lot of it. Quoting from the page I linked to:
The AddMetadata method creates a new object for every property string that is submitted in the <METADATA> element, unless the ObjRef attribute is included in the property string. ObjRef is supported only in a nested property string. It specifies that the object instance is an existing metadata object. It instructs the SAS Metadata Server to create an object reference to the specified object without modifying the specified object's other properties.
Not sure how much this applies as you're updating metadata rather than adding. I dug it up from some old code, so I'm pretty sure my 'journey of discovery' was trial & error 🙂
... View more
03-05-2018
07:49 AM
2 Likes
Hi Mark,
To add to Juan's excellent response, here's something that may also interest you. If nothing else, the Python code in our project might help your engineers build a WRS ping routine into whatever monitoring application you use at the moment.
A while ago we built a Python app that uses the Python Requests library to traverse the SASLogon application and log on to WRS/STP/Portal/VA, effectively emulating the behaviour of an end user as if they had just attempted to log on to the application and view a report. You can find it on GitHub here.
The way it works is: - a Python collector script logs on to SAS via HTTP (traversing the SASLogon redirect bits and collecting cookies), and then makes whatever subsequent HTTP requests it needs to and feeds a .csv file with the results and timings of those requests (with the .csv being symlinked to from your webroot, see below) - a D3 based static webapp is served from in your htdocs folder for users to view system availability metrics (the static html app perioducally pulls the .csv data in via ajax. It's static so that it operates independently of your JVM) - a Python 'aggregator' script periodically summarises that data for the historical overviews and produces 3 more csvs that are loaded in in a similar fashion.
While the web application itself might not be of particular use to you, you could use the python components to pull availability data in periodically. You'd have a config of something like this:
{
"hostUrl": "https://apps.boemskats.com/",
"loginPath": "/SASLogon/login",
"loginParams": {
"username": "sasdemo",
"password": "Orion123"
},
"applications": [{
"name": "SAS Web Report Studio",
"tests": [{
"id": "id1",
"type": "stored_process",
"execPath": "/SASWebReportStudio/do",
"execParams": {
"something": "something"
},
"validations": {"mustContain": ["abitofyourreport"], "cantContain": ["ERROR:"]}
}]
}]
}
(It's been a while since I used WRS, but the idea would be to identify a HTTP call that the app makes to the server JVM, and decide what regexes it must match to be a 'successful' call, and what it can't contain)
It's worth noting that when we started building ESM for SAS, identifying the root causes of Web Report Studio instability was pretty high on our agenda. This is why we still monitor both the pooled workspace sessions and the JVMs very closely (in real time), so that we can spot the moment where it all start to overload and reconcile it with a particular user/report/infomap. There are always a couple of reports that cause instability with WRS, and identifying these means you can take preventative measures. If you're interested feel free to drop me a line.
Lastly lastly, (and @JuanS_OCS this will interest you), just this week we started rolling with a custom module architecture for ESM that lets us use python modules such as the collector script above to collect real-time application availability event data straight into ESM. This will feed directly into our alerting architecture, so it is huge for us. I can't help but share 🙂
Nik
... View more
03-05-2018
06:28 AM
1 Like
If for whatever reason the answers already provided here don't give you what you need, here is an approach that might help you gather this data moving forward. It won't give you an answer as to who has logged in in the last 3 months, but will tell you who has logged in in the next 3 :). Also, as you're talking about SASStudio and EG only, this approach will cover all bases.
Assuming you're on Linux, add the following to your WorkspaceServer_usermods.sh:
echo "$(date --iso-8601),$METAUSER" >> /somepath/useraudit.txt
(note, make sure that your somepath is writable by all users)
This will over time give you a logfile that looks like this:
2018-03-03,sasdemo
2018-03-04,nik
2018-03-04,nik
2018-03-04,drjim
2018-03-04,drjim
2018-03-04,drjim
2018-03-05,allan
2018-03-05,drjim
You can see you have the date of the logon and the username. Should be relatively simple to read into SAS.
For something a bit more useful in terms of managing metadata identities, once you've read the above in you can reconcile the logins with those login's user objects using something like this.
data stuff;
format userID identityID userName userDisplayName $200.;
* read in your parsed log dataset ;
set my_logfile_with_user_field_as_userID;
* get identity of the user that logged on ;
rc=metadata_getnasn(cats("omsobj:Login?@UserID contains '",userID,"'"),
"AssociatedIdentity", 1, identityID);
* get identity name & displayname properties;
rc=metadata_getattr(identityID, "Name", userName);
rc=metadata_getattr(identityID, "DisplayName", userDisplayName);
run;
You'll need to run the above code with admin privileges in Metadata as it'll need to be able to read everyone's Login objects. The resulting table will tell you who logged in and when.
Nik
... View more
03-04-2018
02:25 PM
1 Like
Hi Martin,
Try this instead, it should work.
<Person Id="A5NUQPXO.AP00002V">
<IdentityGroups>
<IdentityGroup ObjRef="A5NUQPXO.A500001C" />
</IdentityGroups>
</Person>
Nik
... View more
02-19-2018
12:17 PM
1 Like
When you say the 'sql process', do you mean the session's connection to the database?
Having known a few DBAs, it's the kind of thing they'd get itchy about (for very little reason). The thing is, the connection won't be closed at the end of your PROC SQL statement, but rather for the duration of your libname assignment. This link will give you more insight into how it all works:
http://support.sas.com/documentation/cdl/en/acreldb/63647/HTML/default/viewer.htm#a001342247.htm
Nik
... View more
02-16-2018
04:19 PM
1 Like
I think that the password that the SDW has stored for your SAS user is incorrect, or the account might be locked. Try running this manually on that node after the failure and see what it says:
/SAS/SASHome/SASFoundation/9.4/utilities/bin/sasumgmt -stdio -u sas -v
Nik
... View more
02-06-2018
09:04 AM
2 Likes
Hi Mathias,
The thing with JSONP is that you can't make POST requests, which means you're limited to simple URL-encoded parameter values when sending data to SAS. It's also... a bit of a hack.
I'm not a fan of it, but I'll have a chat to our devs and see if there's a way we can implement it as an optional transport mechanism. Will keep you posted.
Nik
... View more
01-25-2018
04:07 AM
2 Likes
Thanks for the mention Juan. Also don't forget my WORKtop project on GutHub, it's very effective, as long as you don't need historical data. It has a whopping 6 stars now (!!!)
🙂
Nik
... View more
01-21-2018
08:28 PM
4 Likes
Hi Jason,
This is a very good question. Yes, I do believe that suspended flows can have a considerable effect on performance. This is down to the combination of the 'disk-intensive' nature of some SAS processing and the way that operating systems deal with actually writing that data to disk. It is also a subject that @MargaretC has approached a few times, such as in her post from a couple of years ago titled 'When can too much memory hurt SAS'.
I'm not sure which OS you're using, but I'll assume it's Linux for the rest of this post. I'm not sure if LSF can suspend jobs on Windows. In any case, if you're using Windows, I'm sorry.
So, how does a kernel write data to disk? Quoting this article titled Linux Page Cache Basics:
If data is written [to disk], it is first written to the Page Cache [which is itself unused RAM] and managed as one of its dirty pages. Dirty means that the data is stored in the Page Cache, but needs to be written to the underlying storage device first. The content of these dirty pages is periodically transferred (as well as with the system calls sync or fsync) to the underlying storage device. The system may, in this last instance, be a RAID controller or the hard disk directly.
What this means is that if you have a job that is writing data to a SASWORK disk device, and the node that it is running on has a large amount of otherwise unused memory, then this job can flood the Page Cache with huge amounts of data that, as far as it is concerned, it is writing to an awesomely fast SASWORK disk. If that job is then subsequently suspended by LSF, it will be stopped from using any more CPU resource, but the kernel pdflush daemon will continue to sync the contents of the Page Cache to the disk device until all of that data is written (seeing as it effectively told the job that it had already written it to disk). This means that while your flow is 'suspended', the 'disk load' it generated while it was running can continue to have a latent effect on the performance of other jobs that are trying to use the same disk device. The severity of this effect will depend on a few things, like the amount of free memory you have that's eligible to be used as page cache, the amount of bandwidth available on your storage device, the point at which the program was suspended, and your kernel cache tuning configuration.
To help illustrate this, here's an example: we have a node with 128 gigs of RAM (nothing special by modern standards), 20 CPU cores (although this is irrelevant as we will only use one), and a SASWORK disk which, although it has disproportionately little bandwidth available for the purposes of illustrating this point, is still faster than what I often see on some customer sites (120MB/sec).
We run the following code on the server, and halfway through its execution we 'suspend' it:
%let howbig=12e7;
%esmtag(Create dataset);
data sascoms;
array me {1} $200;
do id=1 to &howbig.;
randid = round(ranuni(0) * &howbig.);
output;
end;
run;
This data step creates around 20gb of data in SASWORK (which is on our 140mb/sec disk device). Here is what that looks like on a default configuration in RHEL 7.4:
First, some help interpreting these graphs:
the top graph shows the performance of the SAS job, the bottom graph shows the performance of the node for the same time period
red area on both graphs is CPU. 100% in the top one signifies one _thread_, while 100% on the bottom one is the total CPU capacity available on the node
the green bars in both show write speed: the green bars in top graphs show the _rate at which the process is writing data to the kernel_ (so, writing to the cache), and the green bars on the bottom graph show _the rate at which the kernel is actually writing that data to the device_ (so, flushing that page cache to disk)
finally, the grey area 'descending' from the top of the bottom node graph is the measured size of the page (buffer) cache, which includes both pages that have flushed to disk and pages that have yet to be flushed (dirty)
We can therefore observe the following: The job starts executing the code above at 15:17:42, writing to the kernel (page cache) at ~750MB/second, which is the SASWORK throughput the piece of code above requires in order to sustain near 100% CPU utilisation (ie. to fully utilise a single thread). When the job is then suspended around 25 seconds in, the job's CPU and IO load drop away to 0, and the cache stops growing (grey bit on bottom graph). However, by this point, as far as the job is concerned it has written a 20GB dataset to SASWORK, and by looking at the bottom graph you can see that the kernel continues to flush the page cache to the disk device even though the job is in a suspended state, continuing to max out the write throughput of the disk device. In total, it takes an extra 1m20s after the job is suspended to finish syncing the data that it managed to 'write' to the cache in the 20 seconds it was active. In other words, in this example the 'latent dirty cache effect' lasts almost 4x longer than the actual runtime before it was suspended, and would almost certainly continue to impact the performance of any flows that were resumed following the first flow's suspension.
Luckily, like many other things on Linux, the size of this dirty page cache is tunable. Here is how the same program behaves when the vm.dirty_ratio tuning parameter is reduced from 40 to 1, telling the kernel that instead of the (default on rhel7) 40%, only 1% of total free memory should be used for the dirty page cache:
This time round the job starts executing at 15:23:50. The write throughput between the job and the cache initially spikes to 372MB/sec, but this is almost immediately throttled down to a much saner 120MB/sec as soon as the (now much smaller) dirty page cache fills up, and the size of the cache (bottom graph) grows gradually, unlike before. As a result, when the job is suspended at 15:24:45, the kernel only takes another 10 seconds or so to complete flushing the dirty page cache to disk. Much better, this suspended job wouldn't affect the performance of other newly started flows anywhere near as much as that first one. And when the job is resumed at 15:25:30, it just picks up where it left off.
So, there's our answer, right? In order to stop suspended flows adversely affecting the performance of active ones, we should simply make the dirty page cache tiny?
Not quite. Here's what happens when, instead of suspending that job mid-execution, we let it do its thing and carry on to completion:
When the job completes it cleans up its SASWORK files that it thinks are in its work dir on the disk, which clears them from the dirty cache and stops them being dumped to disk. Voila. Not only that, but the job completes in 30 seconds, rather than the 3+ minutes that it would take if it was to rely on the SASWORK disk device throughput alone.
Of course, this is all for illustration purposes, and the severity of this effect will depend on the actual performance profile of your code. Even so, with our ESM customers we do see a surprising number of jobs in the wild that seem to create some sizable temporary files immediately before termination. I guess that if nothing else, this is also a good way of illustrating the importance of proactively deleting SASWORK datasets in your jobs as soon as you know they're no longer needed to save them being flushed to disk for no reason. The bigger your cache, the more difference doing this will make.
Now this may seem like an extreme example, as no job would write a temporary SASWORK file only to immediately delete it. But, this is exactly how SAS uses UTILLOC, and this is why I consider it to be the most significant element of this 'suspended job cache hangover effect'. SAS procedures which use UTILLOC as their temporary storage normally only utilise those temporary files for the duration of the step's execution, and the temporary files are deleted from that UTILLOC disk as soon as they finish that step. What this means suspending a job while it's halfway through a PROC SORT will produce the exact detrimental performance impact effect shown in the first screenshot above, while letting it finish what it's doing and clean up would likely result in a decent performance profile much more like the one shown on third screenshot. It is in these scenarios that 'suspending' a flow will detriment performance the most, and I do think the effect is very significant, and with the right tooling very measurable.
So TL;DR - Yes, suspended flows can affect overall grid performance. My advice to you would be to therefore avoid suspending flows where possible, instead concentrating on optimising both your schedule, and the efficiency of any jobs that are on the critical dependency path within your schedule. If you have to suspend flows, spend some time tuning your cache, and ensure that your config aligns with the tuning guidelines provided by Margaret's team in collaboration with Barry Marston and the guys from RedHat.
Lastly, seeing as you're having issues with performance, I would highly recommend that you try using Boemska ESM, the performance tuning product for SAS GRID that you can see in these screenshots. We have clients that spent a lot of time and effort trying to improve the performance of their GRID environments using traditional tuning methods before trying ESM, and still managed to make 20-25% gains in performance and batch capacity within weeks of installing our product. I know @JuanS_OCS, for one, is a big fan :). If you're interested feel free to get in touch with me directly.
In any case, I hope this answers your question.
Nik
... View more
01-18-2018
05:35 PM
3 Likes
Thanks for the mention Juan! So, for better or for worse, I'm far, far, far from an LSF expert (or fan :/). But with that in mind, here's how we approach the jobID / unique key issue:
We tend to ignore the LSB_JOBID variable entirely, although we pick up the $LSB_JOBNAME var so that we can record the flow/subflow info for each session, which in turn allows us to visualise jobs quite nicely using drillable treemaps. Instead of using LSF generated IDs, we generate a 'UID' for each session by sourcing our esmconfig.sh file and generating a variable whenever each lev's appservercontext_env_usermods.sh file is sourced. Historically this was a variant of something like export ESMGUID=$(date +'%s')$$_$(hostname), which near enough ensured a unique key for each session started from each server in our GRID; this creates a unique key of [this second][this pid][on this host]. What we would then do inside our autoexec code is this:
newguid = "&ESMGUID";
if envlen('ESMPARENTUUID') > 0 then do;
/* This suggests that this variable was set by a parent session so this is a child */
currentguid = sysget('ESMPARENTUUID');
newUUIDstring = cats(currentguid, '-', newguid);
call symputx('ESMJOBUUID',newUUIDstring);
end;
else do;
/* This suggests that this session is the main job that collects all the RCs */
call symputx('ESMJOBUUID',newguid);
end;
options set=ESMPARENTUUID="&ESMJOBUUID.";
What this gives us is a good way of guessing, within the autoexec, whether the process that just started is a 'parent' session for a job, or a child (GRID) subsession, the performance data for which should be reconciled with that of the parent session. It's the best mechanism we have, so far, of building a tree of GRID subsessions (by parsing on the '-' character separators), while maintaining a linkable, unique UUID for each session.
Finally, worth mentioning that last year we moved away from using `ESMGUID=$(date +'%s')$$_$(hostname)` as our identifier generator, instead using a 1-liner java UUIDgen program to generate something that's guaranteed to be unique. Less human readable than the old format and no longer sortable by date, but it increases the chances of 'uniqueness' further (considerably); it's unlikely that a kernel would assign the same pid to two processes on the same host within the same second, but it's even more unlikely that UUID duplication would occur on that same host, ever.
Don't know how much this helps you, but it's an answer 🙂
Nik
... View more
01-16-2018
02:28 PM
1 Like
Hi,
At Boemska we offer a product called Enterprise Session Monitor for SAS. It's a piece of software that plugs into your SAS Environment and profiles the resource utilisation of individual SAS jobs, producing timeseries data which our customers use to optimise job performance, often focusing on single problem steps like the one you describe.
ESM records and visualises the CPU/memory utilisation, temp directory size and IO throughput of each individual job, allowing you to contrast it with the resource profile of the node it's executing on, showing metrics like iowait, per-device throughput (for both storage and network devices), disk queue lengths and cache/swap size. The data is very granular (2s intervals) and the interactive investigative workflow makes root cause analysis a relatively pleasant experience.
We're a SAS partner organisation & this is a separate proprietary product, but we offer a free 60 day trial, meaning you could take it for a spin for a couple of months with a view to resolve your immediate issue, no strings attached. Feel free to contact me privately if you're interested.
Nik
... View more
01-14-2018
02:05 PM
4 Likes
Hi,
From the sound of it, you would be trying to authenticate against SAS from a HTTP client such as a browser-based app (or JS console) rather than PROC HTTP. I expect your confusion comes from the paper I think you meant to link to, which uses PROC HTTP to show RESTful interaction with the SASLogon webapp, but only for the purposes of demonstrating the mechanism of HTTP interaction rather than any particularly useful use case in terms of app development.
What I would suggest you do, if for whatever reason you choose to not go down the H54S route, is show the mechanism presented in this paper to your front end / GUI developers so that they can emulate it (much like our adapter does). To use the RESTful approach outlined in that paper they'll need to do the following:
1. Issue a POST request to the SASLogon webapp, specifically to /SASLogon/v1/tickets, to get a 'ticket granting ticket'. In the POST request body, include the username and password parameters as specified by that paper.
2. This 'ticket granting ticket' will be communicated back by the SASLogon app via the return headers, as the Location parameter. You can use Postman to test this yourself. Make sure that you/they set the Content-type headers to x-www-form-urlencoded for this to work. In Postman this first request looks like this:
As you can see, the app returns a 'Location' property/ in the header, as described by the paper. Your front end developers will need to extract, using JavaScript, the Location URL that was passed back in the response headers.
3. Following this, they will need to issue another POST request to the URL they extracted, this time with one parameter, service, where they will specify which app endpoint they would like to be authenticated to use. If it's, for example, the Stored Process WebApp, that request will look like this:
4. You'll notice that this request now provides a ticket ID in the Response Body. Your front end guys, again, will need to extract that ticket and append it to the URL of the target application they're looking to communicate with. You now have a ticket you can use to talk to the application you requested. Here's what that looks the SPWA:
That's it. One thing I'm not sure about is how this approach manages timeouts etc., so you may need to capture 302 redirects back to the SASLogon app when a user has been idle for long enough to time out. You may want to show your developers how we queue any expired requests up so that they can be executed after a successful logon so that they can implement something similar.
Just for completeness, when using the Adapter this process involves the following JavaScript code:
adapter.login('myusername','mypassword');
Hope this helps you.
Nik
... View more
11-14-2017
04:41 PM
3 Likes
You probably figured this out already, but it looks to me like you've got a Windows depot there 🙂
... View more
11-14-2017
01:45 PM
What do you see when you run this:
ls /usr/src/sas_m5/products/privatejre*
... View more