I have the following data that i need to import.
It is located in the middle of a text document.
Knowing the line on which the data starts, how do i go about importing it?
===[Summary Statistics by Model]=== The three-parameter logistic (3PL) model: New Form Old Form -------------------------- -------------------------- a_j b_j c_j a_j b_j c_j N.Obs. 10 10 10 10 10 10 Mean 1.0000 -0.5549 2.0000 2.0000 -0.8378 0.4848 Std.Dev. 0.8737 0.7388 0.4899 1.0000 0.4838 1.9849
I have tried the following code, using both tab and space delimiters.
Where LineEnd is the number of the last row of data and LineStart is the first row of data.
Neither Tab nor Space delimiters produce the desired result.
Tab gives a table with all the data in a single variable.
Space gives a table with 53 variables, and the data scattered throughout.
Any help appreciated.
options obs=&LineEnd.; proc import datafile= "FilePath" dbms=dlm out=Test replace; getnames=yes; datarow=&LineStart.; guessingrows=max; run;
@mcook wrote:
I have the following data that i need to import.
It is located in the middle of a text document.
Knowing the line on which the data starts, how do i go about importing it?
===[Summary Statistics by Model]=== The three-parameter logistic (3PL) model: New Form Old Form -------------------------- -------------------------- a_j b_j c_j a_j b_j c_j N.Obs. 10 10 10 10 10 10 Mean 1.0000 -0.5549 2.0000 2.0000 -0.8378 0.4848 Std.Dev. 0.8737 0.7388 0.4899 1.0000 0.4838 1.9849
I have tried the following code, using both tab and space delimiters.
Where LineEnd is the number of the last row of data and LineStart is the first row of data.
Neither Tab nor Space delimiters produce the desired result.
Tab gives a table with all the data in a single variable.Space gives a table with 53 variables, and the data scattered throughout.
Any help appreciated.
options obs=&LineEnd.; proc import datafile= "FilePath" dbms=dlm out=Test replace; getnames=yes; datarow=&LineStart.; guessingrows=max; run;
Proc import is not going to be of much use, as in none.
Given that text example what should the output data set look like, as in variable names and values?
Will you have other similar documents? Will they have the same number of parameters? Statistics (the row labels>)?
If this is a single exercise, or other files are exactly the same in layout, you can use a data step with an infile statement to tell SAS which row data starts on with the Firstobs= option.
Here is one way to read multiple lines into specific variables. I used N for "new" and O for "old" and just numbered the variables just to demonstrate the code. The suffix in the variables names indicates which statistic is read from that line.
The / at the start of the variables in the input statement says "go to next line of the source to read".
Data example; input Stat $ N1_n N2_n N3_n O1_n O2_n O3_n / Stat $ N1_mean N2_mean N3_mean O1_mean O2_mean O3_mean / Stat $ N1_std N2_std N3_std O1_std O2_std O3_std ; drop stat; datalines; N.Obs. 10 10 10 10 10 10 Mean 1.0000 -0.5549 2.0000 2.0000 -0.8378 0.4848 Std.Dev. 0.8737 0.7388 0.4899 1.0000 0.4838 1.9849 ;
In reading an external file instead of having the DATALINES you would have an infile statement similar to
Infile "<path to file>\filename.txt" Firstobs = xx;
The XX would be the line in the file where the first row of values would be encountered such as the N.Obs.
There are other ways. If you wanted a dynamic, i.e. the number of variables in the "new" and "old" blocks differ it could be done but the data structure would be different.
There will be many documents similar to this one. Though all should have the same format and information, just different values.
Originally I wanted to import it as it appears. But without changing variable names to distinguish between the Old and New Form variables, I think it best to import it as 2 separate tables., each with N.Obs, Mean, Std.Dev. and variables a_j, b_j, c_j.
You need custom code for this.
Nothing complicated, something like this:
data _null_;
file "&wdir\t.txt";
put '==[Summary Statistics by Model]=== ';
put ' ';
put 'The three-parameter logistic (3PL) model: ';
put ' New Form Old Form ';
put ' -------------------------- -------------------------- ';
put ' a_j b_j c_j a_j b_j c_j ';
put 'N.Obs. 10 10 10 10 10 10 ';
put 'Mean 1.0000 -0.5549 2.0000 2.0000 -0.8378 0.4848 ';
put 'Std.Dev. 0.8737 0.7388 0.4899 1.0000 0.4838 1.9849 ';
run;
data WANT;
length VAR1-VAR18 $8;
retain VAR1-VAR18;
infile "&wdir\t.txt";
input;
if _INFILE_ = '==[Summary Statistics by Model]===' then FOUND+1;
if _INFILE_ =:'N.Obs. ' & FOUND then do;
VAR1 =scan(_INFILE_,2,' ');
VAR2 =scan(_INFILE_,3,' ');
VAR3 =scan(_INFILE_,4,' ');
VAR4 =scan(_INFILE_,5,' ');
VAR5 =scan(_INFILE_,6,' ');
VAR6 =scan(_INFILE_,7,' ');
end;
if _INFILE_ =:'Mean' & FOUND then do;
VAR7 =scan(_INFILE_,2,' ');
VAR8 =scan(_INFILE_,3,' ');
VAR9 =scan(_INFILE_,4,' ');
VAR10=scan(_INFILE_,5,' ');
VAR11=scan(_INFILE_,6,' ');
VAR12=scan(_INFILE_,7,' ');
end;
if _INFILE_ =:'Std.Dev.' & FOUND then do;
VAR13=scan(_INFILE_,2,' ');
VAR14=scan(_INFILE_,3,' ');
VAR15=scan(_INFILE_,4,' ');
VAR16=scan(_INFILE_,5,' ');
VAR17=scan(_INFILE_,6,' ');
VAR18=scan(_INFILE_,7,' ');
output;
stop;
end;
run;
Obs | VAR1 | VAR2 | VAR3 | VAR4 | VAR5 | VAR6 | VAR7 | VAR8 | VAR9 | VAR10 | VAR11 | VAR12 | VAR13 | VAR14 | VAR15 | VAR16 | VAR17 | VAR18 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 10 | 10 | 10 | 10 | 10 | 10 | 1.0000 | -0.5549 | 2.0000 | 2.0000 | -0.8378 | 0.4848 | 0.8737 | 0.7388 | 0.4899 | 1.0000 | 0.4838 | 1.9849 |
Or if you want to mix both logics, something like this:
data WANT;
infile "&wdir\t.txt";
input @;
if _INFILE_ = '==[Summary Statistics by Model]===' then FOUND+1;
if _INFILE_ =:'N.Obs. ' & FOUND;
input @1 STAT $ NEW1 NEW2 NEW3 OLD1 OLD2 OLD3; output;
input @1 STAT $ NEW1 NEW2 NEW3 OLD1 OLD2 OLD3; output;
input @1 STAT $ NEW1 NEW2 NEW3 OLD1 OLD2 OLD3; output;
stop;
run;
This is a case where use of the @ "string" feature of the INPUT statement is just the right tool:
data want;
input @'N.Obs.' n1-n6
@'Mean' mu1-mu6
@'Std.Dev.' std1-std6 ;
datalines;
===[Summary Statistics by Model]===
The three-parameter logistic (3PL) model:
New Form Old Form
-------------------------- --------------------------
a_j b_j c_j a_j b_j c_j
N.Obs. 10 10 10 10 10 10
Mean 1.0000 -0.5549 2.0000 2.0000 -0.8378 0.4848
Std.Dev. 0.8737 0.7388 0.4899 1.0000 0.4838 1.9849
run;
It doesn't matter how far down the "N.Obs." line is. All of them will be skipped by the input statement until the "N.Obs." is encountered.
As long as you always have three lines starting with "N.Obs.", "Mean", and "Std.Dev.", respectively, and they are in that order (though not necessarily consecutive), and you always have 6 data columns in each, this works just fine.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.