Server settings and disk usage

k4minou · Posted 09-03-2019 09:59 AM

Hello,

Before asking my noob questions, here are the contexts :
Few months ago, we got disk usage issues (especially the workspace), to fix it, the administrators decided to move all our workspace to an other disk, mainly dedicated to archived data (we do not know how are the velocity of these disks, we also do not know if there are any others settings to pay attention on too, that could explain the weird behaviour below - 0 visibility as final user).

We only have noticed few weirds behaviours, with the same SQL and/or datastep (i.e. same code) :
- the calculations are extremely slow (x3 in term of time)
- the final size of tables increase insanely (x10 in term of size)

Could it be server side issue? I mean, could it be related to the disk properties? For the slowing of calculations because of I/O I could understand, but why the size of table also increases?

I have already done a quick test, the sum of the data is the same, same before and after (same for the total of observations), but not the same size... And above all, the code is the same... so why?

I am not administrator, so I have no clue at all, any suggestions/feedbacks are welcome

Thank your for reading

Kurt_Bremser · Posted 09-03-2019 11:52 AM

You can have the same code and get completely different results.

If you run proc import against a csv file where a certain string has never more than 10 characters, and next month there's one value with 32000 characters, the whole column will blow up to that size, and your dataset file size increase by several orders of magnitude.

So you need to take a look at the data, and some of your system options. If you had compress set as a system option, but now you don't, dataset files will grow although the logical content is the same.

Compare dataset metadata (variables, variable lengths, observations, compression) between then and now.

Bad performance of steps can be caused by slow disks in the WORK. WORK needs the fastest you can get your hands on (nowadays, that means SSD); if you want high availability, use RAID 1 (mirroring), not RAID 5.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Anand_V · Posted 09-03-2019 11:41 PM

Hi @k4minou ,

I think its better you reach out to your site storage admin to understand what changes have been made to the new disk, when compared to old one. Options then can be compared and updated,as required.

In order to test I/O, SAS provides these utilities Windows & UNIX using which you can test and validate against your new disk. If you don't have access to server, you can pass these on to your site admin.

Thanks!

k4minou · Posted 09-04-2019 03:56 AM

Thank you for your time both of you @Kurt_Bremser @Anand_V

Actually I am a consultant, I work with existing projects/code on also existing environment.

The administrators here, do not want to change things mainly because they do not know SAS; they work on the recommendations of SAS. So it does mean that as long as things "work" it's fine for them (even if it's slow etc etc).

@Kurt_Bremser I agree with you on that about the import, it could change and explain a lot of things. In my case, I am talking about same code, same data, same results but with 2 differents things : slowness and bigger final size of dataset in the end

@Anand_V yes, you probably are right, I would do the same but as said, I just can not for now. I have to bring out more and relevant "proof". Otherwise, they will reply to me that it's all my fault with my crappies codes (not mine but existing code...)

To make them accept to move the Work, and without visibility on the server as final user, I have had to inject DOS commands to have better idea on who use what, which ressources etc... (c.f. my previous question about the related security of injection of code in the SAS datastep)

The project here is realy political I would say 😕

Anyway, thank you, I will do futher tests 😃

Kurt_Bremser · Posted 09-04-2019 04:34 AM

@k4minou wrote:

@Kurt_Bremser I agree with you on that about the import, it could change and explain a lot of things. In my case, I am talking about same code, same data, same results but with 2 differents things : slowness and bigger final size of dataset in the end

Then it's either not SAS, or you have different SAS system options (compress) set. I have never experienced anything but the most minuscule changes in dataset file size when moving from one disk to another (eg more overhead because of larger allocation units). When there were drastic changes, they happened to small datasets (a change from pagesize 64K to pagesize 128K will double a SAS dataset that only needs one page).

One thing that came to my mind is the use of "sparse files". When that is set up on a filesystem level (meaning outside of SAS!), empty sectors are not written to a disk, but only recorded in file allocation data, reducing file size considerably if such empty allocation units are present in the data.

It could also be a matter of reporting file sizes if the filesystems themselves are compressed, and one reports compressed size, while the other reports uncompressed net size.

And I can't see SAS recommending anything but the fastest disks available for WORK; I suspect your lazy admins are just feeding you bullshit, or lack the basic competence for their job. Something that sadly happens often in the Windows world, where everybody thinks it's so easy that the biggest idiot can do it, while in fact it's the most complicated platform to work with 😞

I suggest submitting your problem to SAS technical support, and then put their findings to your admins (unless it's determined that a SAS system option or setting was responsible).

Am I happy that I am the SAS and system admin in one person. When TSHTF, it's always my job to set it right. And I usually know what I'm doing, after lots of encounters with reality (aka "experience", see Maxim 41).

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

k4minou · Posted 09-04-2019 05:06 AM

Ha ha! You said it =D

Thank you, I do not understand why they are so so lazy here...

I have digged out, by chance, an old report from SAS (done by a SAS consultant I know really well, because we worked together as Accenture staff before he joined SAS ^^)

The report was done in 2012! All the things we say here were recommanded by him, especially to split/partition for each utilisation... But nothing is done, the administrators "forgot" it from what they say. We are 2019... Nothing is done... Not even the passthrough to access on DB2 =/

And you know what? To not do the wise recommandations, they changed the report and also replied with a lot of questions... A way to not do things is just asking/flooding questions here...

When I saw the author of the report, I directly ask him the final report to see the difference; stunning! lol

It is the first time I have such a bad experiences on a SAS project ='(

For now, I will just gather as much as possible facts... to make things moving (or not...)

Server settings and disk usage

Re: Server settings and disk usage

Re: Server settings and disk usage

Re: Server settings and disk usage

Re: Server settings and disk usage

Re: Server settings and disk usage