About pcur

pcur · ‎09-28-2016

Hi Zeke, Thanks for the input. I did arrive at a solution after contacting SAS tech support. The problem related to an I/O mismatch originating in the usage of proc nlp with huge datasets. I was able to correct this by specifying the -SGIO option in the command line when launching SAS. This is the scatter-read / gather-write method which provides improved throughput in accessing the available cache. I hope that is helpful to others that may encounter similar issues, though I suspect this will be specific to the usage of proc nlp with large (~50 GB) datasets. To clarify, this was occuring in a desktop Windows environment w/SAS 9.4, was unrelated to RAM usage or available memory (I have 16 GB, no more than 8-10 used when the proc failed), and was also unrelated to log output (as I indicated was re-directed to a dummy file via proc printto, a step I suggest for any doing large, multi-step simulations that will quickly choke your log).

pcur · ‎08-01-2016

Thanks, I appreciate your helpful comments. I'm not sure that this is a scaling issue, per se, as the dissimilar properties of the simulated datasets are reproduced if I standardize my dataset before calculating the correlation and covariance matrices from which I simulate.

pcur · ‎07-29-2016

My goal has been to take the correlation matrix from an existing (empirical) multivariate dataset and use this to generate a centered and standardized (mean=0, SD=1) simulated dataset. The code I use to do so is copied below. I have been using the correlation matrix of my real data as the input to RANDNORMAL and when I do so my output dataset looks exactly as one would imagine, i.e. means around 0 and SD of 1, with same correlation structure as the original dataset. However I realize RandNormal was originally intended to accept the covariance matrix, not the correlation matrix, as its input. When I used the covariance matrix as input to randnormal I find some unexpected results - the standard deviation of my simulation now suddenly varies quite a bit, from 0.39-1.09, though my means still hover around 0 and the simulated correlation matrix is as expected. My question is why does variability in my simulated data seem to increase with the use of the covariance matrix, and how can I account for this? I am concerned that the data generated with the correlation matrix may yield unexpected linear dependencies. Here is the code I use, which I obtained both from this forum and from The Do Loop blog (http://blogs.sas.com/content/iml/😞 proc iml; call randseed(4321); /* specify population mean and covariance */ use simfin.covmat; * <------here I either use the correlation or covariance matrix. The cov matrix is poorly standardized.; read all var _num_ into Cov[c=varNames]; /* save var names */ close simfin.corrmat; Mean = j(nrow(Cov),1,0); /* zero vector */ N = 500; /* sample size */ NumSamples = 1; /* number of samples/replicates */ X = RandNormal(N*NumSamples, Mean, Cov); ID = colvec(repeat(T(1:NumSamples), 1, N)); /* 1,1,1,...,2,2,2,...,3,3,3,... */ Z = ID || X; varNames = "ID" || varNames; /* comncatenate "ID" to var names */ create MVN from Z[c=varNames]; append from Z; close MVN; quit;

pcur · ‎07-18-2016

I am running a fairly large simulation study. There are multiple proc's involved, including bootstrapping, randomized sampling, proc nlp, and proc genmod, as well as a series of data steps in between, some of which generate fairly large datasteps. Given the above, and since my base dataset is approx 48 GB, I installed a new 4TB drive to accomodate memory usage. This appears to be more than sufficient as running my whole simulation and assorted data/procs I never use more than 250 GB. (this 4TB drive is otherwise empty, excepting the 500GB allocated for the OS and other programs/data - so really 3.5TB available). I nonetheless am getting this error at a predictable point - i.e., always on BY step XXX - in the midst of PROC NLP. It reads: ERROR: File Work.'SASTMP-0001108070'n.UTILITY is damaged. I/O processing did not complete. The SAS System stopped proccesing this step because of errors. To address this apparent memory issue I've added the memsize MAX option to the SASv9.cfg shortcut. I've also used a user-specified work directory (e.g. libname xx 'c:\xx'; user=xx). I also usually run the simulation with a proc printto that directs my log to a dummy file. Despite the above measures, and apparently amble available resources (again, literally terabytes of memory are still available on this single-user windows PC), I am getting this seemingly memory-related error message. Any insights or suggestions would be much appreciated.

pcur · ‎05-06-2016

Much appreciated!

pcur · ‎05-06-2016

Hi Rick, Thanks for your reply! One thing I'm not following in your script is the preservation of the original correlation matrix in the generated sample. The sample correlation matrix im generating from has 34 variables, and I need the correlation pattern in those variables preserved in the generated data.

pcur · ‎05-06-2016

Hi Folks, I'm very new to PROC IML but am attempting to simulate a dataset based on a preexisting correlation matrix. I found a nice macro for this (corr2data) at http://www.ats.ucla.edu/stat/sas/macros/corr2data_demo.htm. I've simplified this slightly for my use, as provided below. This works well for me but I need to adapt this to simulate hundreds of datasets from the same correlation matrix, and assign a sample ID to each simulated dataset. I used a %Do loop to achieve this, and in combination with a datastep this works, but I'm failing at making a sampleID variable to append to each row of the output dataset. (e.g., so all rows generated in the first iteration are labeled "1", and all rows labeled in the 13th iteration are labeled "13", for example). Here's what I've got so far. Help much appreciated. The macro call: %SIMUDATA(simu.Fakenorm, simu.Rmat, 1051); the macro: %macro SIMUDATA(outdata, corrmat, n); %Do index=1 %to 20 %by 1; proc iml; use &corrmat; read all var _num_ into C; rn = nrow(C); cn = ncol(C); p = root(C); dim = nrow(C); myvar = rannor(J(&n, dim, 0)); do i = 1 to dim; myvar[, i] = myvar[,i]-(sum(myvar[,i])/&n); end; XX = (t(myvar)*myvar)/(&n-1); U = root(inv(XX)); Y = myvar*T(U); T = Y*p; * S=J(dim, 1, &index); <----I made this in an attempt to create a vector of nrow length labeled with the index number * V=insert(T,S,0,35); <-----I made this to append that to the simulated dataset create &outdata from V; append from V; quit; data simu.cumu; set simu.cumu &outdata; run; %end; %mend;

Online Status	Offline
Date Last Visited	‎09-28-2016 11:47 AM

Re: ERROR: FILE WORK.'SAS.TMP-000110870'n.Utility is damaged. I/O proc...

Re: Covariance vs. Correlation matrices for Simulations with RandNorma...

Covariance vs. Correlation matrices for Simulations with RandNormal in...

ERROR: FILE WORK.'SAS.TMP-000110870'n.Utility is damaged. I/O processi...

Re: IML data simulation Do Loop Fail

Re: IML data simulation Do Loop Fail

IML data simulation Do Loop Fail

Re: ERROR: FILE WORK.'SAS.TMP-000110870'n.Utility is damaged. I/O proc...

Re: ERROR: FILE WORK.'SAS.TMP-000110870'n.Utility is damaged. I/O proc...

Re: Covariance vs. Correlation matrices for Simulations with RandNorma...

Covariance vs. Correlation matrices for Simulations with RandNormal in...

ERROR: FILE WORK.'SAS.TMP-000110870'n.Utility is damaged. I/O processi...

Re: IML data simulation Do Loop Fail

Re: IML data simulation Do Loop Fail

IML data simulation Do Loop Fail