For reasons too silly to explain, I must import several correlation matrices into SAS that were originally generated by Mplus. The problem is that when Mplus outputs such a matrix, it stores only the lower half of the matrix in the following format (first 9 variables only). corr.dat: 0.10000000E+01 0.94232283E+00 0.10000000E+01 0.93041547E+00 0.94801541E+00 0.10000000E+01 0.74084188E+00 0.74687603E+00 0.68597923E+00 0.10000000E+01 0.85682659E+00 0.85743307E+00 0.84079052E+00 0.74511362E+00 0.10000000E+01 0.79426036E+00 0.78624117E+00 0.75429962E+00 0.72166804E+00 0.84584134E+00 0.10000000E+01 0.56886450E+00 0.56152217E+00 0.56380513E+00 0.47751877E+00 0.46174713E+00 0.41570092E+00 0.10000000E+01 0.54628978E+00 0.55188666E+00 0.55358386E+00 0.50515092E+00 0.50253479E+00 0.39646716E+00 0.94309219E+00 0.10000000E+01 0.53257041E+00 0.54576132E+00 0.54605514E+00 0.50582361E+00 0.49268571E+00 0.37712732E+00 0.90704115E+00 0.95095093E+00 0.10000000E+01 Two obvious problems here: 1) the values are in scientific notation. 2) the column structure of the file (i.e. not a square or even lower-half-only matrix structure). I'm trying to write code to input the raw correlation data and convert it to an actual correlation matrix in SAS for further analysis. So far I've been using "find and replace" in corr.dat to create a file that looks like this: 0.10000000 , +01 , 0.94232283 , +00 , 0.10000000 , +01 , 0.93041547 , +00 , 0.94801541 , +00 0.10000000 , +01 , 0.74084188 , +00 , 0.74687603 , +00 , 0.68597923 , +00 , 0.10000000 , +01 0.85682659 , +00 , 0.85743307 , +00 , 0.84079052 , +00 , 0.74511362 , +00 , 0.10000000 , +01 0.79426036 , +00 , 0.78624117 , +00 , 0.75429962 , +00 , 0.72166804 , +00 , 0.84584134 , +00 0.10000000 , +01 , 0.56886450 , +00 , 0.56152217 , +00 , 0.56380513 , +00 , 0.47751877 , +00 0.46174713 , +00 , 0.41570092 , +00 , 0.10000000 , +01 , 0.54628978 , +00 , 0.55188666 , +00 0.55358386 , +00 , 0.50515092 , +00 , 0.50253479 , +00 , 0.39646716 , +00 , 0.94309219 , +00 0.10000000 , +01 , 0.53257041 , +00 , 0.54576132 , +00 , 0.54605514 , +00 , 0.50582361 , +00 0.49268571 , +00 , 0.37712732 , +00 , 0.90704115 , +00 , 0.95095093 , +00 , 0.10000000 , +01 Next, I import the data and create a temp dataset, calculating the actual correlations using the exponent values: PROC IMPORT OUT= WORK.corr DATAFILE= "c:\temp\CORR.dat" DBMS=DLM REPLACE; DELIMITER='2C'x; GETNAMES=NO; DATAROW=1; RUN; data temp; set corr; array correl (5) var1 var3 var5 var7 var9; array expon (5) var2 var4 var6 var8 var10; array new (5) new1-new5; do i = 1 to 5; new(i) = correl(i)*(10**expon(i)); end; drop i; keep new1-new5; run; So now my temp data looks like this (correct values, wrong structure): 1 0.94232283 1 0.93041547 0.94801541 1 0.74084188 0.74687603 0.68597923 1 0.85682659 0.85743307 0.84079052 0.74511362 1 0.79426036 0.78624117 0.75429962 0.72166804 0.84584134 1 0.5688645 0.56152217 0.56380513 0.47751877 0.46174713 0.41570092 1 0.54628978 0.55188666 0.55358386 0.50515092 0.50253479 0.39646716 0.94309219 1 0.53257041 0.54576132 0.54605514 0.50582361 0.49268571 0.37712732 0.90704115 0.95095093 1 The part I'm stuck on is rearranging the values in the above temp dataset to form the lower-triangle correlation matrix I need to move on with the analysis (I already have IML code to generate the full square matrix once I obtain the lower half). In looking for an answer it seemed like a good idea to export and then re-import the data as below, but the following is clearly too simple (and results in a NOTE: LOST CARD error in the log). Working code would keep the "1" values on the diagonal, creating a new observation beginning with the next data point after encountering a "1" value. The code would then read all subsequent values encountered (over multiple lines of corr2.dat) as part of the same new observation, up to and including the next "1" value. This process should repeat until the end of the file. data _null_; set temp; file 'c:\temp\corr2.dat'; put new1-new5; run; data temp2; infile 'c:\temp\corr2.dat' flowover; input var1-var9; run; Since I have to do this with quite a few different datasets containing different numbers of variables, it would ultimately be fantastic if the code could just handle the original corr.dat file (sample pasted at the very top of this post) directly and then do the necessary calculation and restructuring. I am sure there is a way to get SAS to do this, but my INFILE skills are clearly not up to the task. I've also played around with the DLMSTR= option after creating a character delimiter via find-and-replace in corr2.dat, to no avail. THANKS for any help!
... View more