BookmarkSubscribeRSS Feed
mcook
Quartz | Level 8

 

I have the following data that i need to import.  

It is located in the middle of a text document.  

Knowing the line on which the data starts,  how do i go about importing it?  

 

===[Summary Statistics by Model]===

The three-parameter logistic (3PL) model:
                       New Form                     Old Form
             --------------------------     --------------------------
               a_j       b_j       c_j        a_j       b_j       c_j
N.Obs.           10        10        10         10        10        10
Mean         1.0000   -0.5549    2.0000     2.0000   -0.8378    0.4848
Std.Dev.     0.8737    0.7388    0.4899     1.0000    0.4838    1.9849

 

I have tried the following code, using both tab and space delimiters.  

Where LineEnd is the number of the last row of data and LineStart is the first row of data.

Neither Tab nor Space delimiters produce the desired result.   
Tab gives a table with all the data in a single variable.

Space gives a table with 53 variables, and the data scattered throughout.  

Any help appreciated.  

 

options obs=&LineEnd.;
proc import datafile= "FilePath" 
	dbms=dlm
	out=Test replace;
	getnames=yes;
	datarow=&LineStart.;
	guessingrows=max;
run;

 

6 REPLIES 6
ballardw
Super User

@mcook wrote:

 

I have the following data that i need to import.  

It is located in the middle of a text document.  

Knowing the line on which the data starts,  how do i go about importing it?  

 

===[Summary Statistics by Model]===

The three-parameter logistic (3PL) model:
                       New Form                     Old Form
             --------------------------     --------------------------
               a_j       b_j       c_j        a_j       b_j       c_j
N.Obs.           10        10        10         10        10        10
Mean         1.0000   -0.5549    2.0000     2.0000   -0.8378    0.4848
Std.Dev.     0.8737    0.7388    0.4899     1.0000    0.4838    1.9849

 

I have tried the following code, using both tab and space delimiters.  

Where LineEnd is the number of the last row of data and LineStart is the first row of data.

Neither Tab nor Space delimiters produce the desired result.   
Tab gives a table with all the data in a single variable.

Space gives a table with 53 variables, and the data scattered throughout.  

Any help appreciated.  

 

options obs=&LineEnd.;
proc import datafile= "FilePath" 
	dbms=dlm
	out=Test replace;
	getnames=yes;
	datarow=&LineStart.;
	guessingrows=max;
run;

 


Proc import is not going to be of much use, as in none.

Given that text example what should the output data set look like, as in variable names and values?

Will you have other similar documents? Will they have the same number of parameters? Statistics (the row labels>)?

If this is a single exercise, or other files are exactly the same in layout, you can use a data step with an infile statement to tell SAS which row data starts on with the Firstobs= option.

 

Here is one way to read multiple lines into specific variables. I used N for "new" and O for "old" and just numbered the variables just to demonstrate the code. The suffix in the variables names indicates which statistic is read from that line.

The / at the start of the variables in the input statement says "go to next line of the source to read".

Data example;
   input Stat $ N1_n N2_n N3_n O1_n O2_n O3_n
      /  Stat $ N1_mean N2_mean N3_mean O1_mean O2_mean O3_mean
      /  Stat $ N1_std N2_std N3_std O1_std O2_std O3_std
   ;
   drop stat;
datalines;
N.Obs.           10        10        10         10        10        10
Mean         1.0000   -0.5549    2.0000     2.0000   -0.8378    0.4848
Std.Dev.     0.8737    0.7388    0.4899     1.0000    0.4838    1.9849
;

In reading an external file instead of having the DATALINES you would have an infile statement similar to

Infile "<path to file>\filename.txt" Firstobs = xx;

The XX would be the line in the file where the first row of values would be encountered such as the N.Obs.

 

There are other ways. If you wanted a dynamic, i.e. the number of variables in the "new" and "old" blocks differ it could be done but the data structure would be different.

 

mcook
Quartz | Level 8

There will be many documents similar to this one.  Though all should have the same format and information, just different values.  

Originally I wanted to import it as it appears. But without changing variable names to distinguish between the Old and New Form variables, I think it best to import it as 2 separate tables., each with N.Obs, Mean, Std.Dev. and variables a_j, b_j, c_j.  

ChrisNZ
Tourmaline | Level 20

You need custom code for this.

Nothing complicated, something like this:

data _null_;
  file "&wdir\t.txt";
  put '==[Summary Statistics by Model]===                                      ';
  put '                                                                        ';
  put 'The three-parameter logistic (3PL) model:                               ';
  put '                       New Form                     Old Form            ';
  put '             --------------------------     --------------------------  ';
  put '               a_j       b_j       c_j        a_j       b_j       c_j   ';
  put 'N.Obs.           10        10        10         10        10        10  ';
  put 'Mean         1.0000   -0.5549    2.0000     2.0000   -0.8378    0.4848  ';
  put 'Std.Dev.     0.8737    0.7388    0.4899     1.0000    0.4838    1.9849  ';
run;

data WANT;
  length VAR1-VAR18 $8;
  retain VAR1-VAR18;
  infile "&wdir\t.txt";
  input;
  if _INFILE_ = '==[Summary Statistics by Model]===' then FOUND+1;
  if _INFILE_ =:'N.Obs. ' & FOUND then do;
    VAR1 =scan(_INFILE_,2,' ');
    VAR2 =scan(_INFILE_,3,' ');
    VAR3 =scan(_INFILE_,4,' ');
    VAR4 =scan(_INFILE_,5,' ');
    VAR5 =scan(_INFILE_,6,' ');
    VAR6 =scan(_INFILE_,7,' ');
  end;
  if _INFILE_ =:'Mean' & FOUND then do;
    VAR7 =scan(_INFILE_,2,' ');
    VAR8 =scan(_INFILE_,3,' ');
    VAR9 =scan(_INFILE_,4,' ');
    VAR10=scan(_INFILE_,5,' ');
    VAR11=scan(_INFILE_,6,' ');
    VAR12=scan(_INFILE_,7,' ');
  end;
  if _INFILE_ =:'Std.Dev.' & FOUND then do;
    VAR13=scan(_INFILE_,2,' ');
    VAR14=scan(_INFILE_,3,' ');
    VAR15=scan(_INFILE_,4,' ');
    VAR16=scan(_INFILE_,5,' ');
    VAR17=scan(_INFILE_,6,' ');
    VAR18=scan(_INFILE_,7,' ');
    output;      
    stop;
  end; 
run;

Obs VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8 VAR9 VAR10 VAR11 VAR12 VAR13 VAR14 VAR15 VAR16 VAR17 VAR18
1 10 10 10 10 10 10 1.0000 -0.5549 2.0000 2.0000 -0.8378 0.4848 0.8737 0.7388 0.4899 1.0000 0.4838 1.9849

 

jimbarbour
Meteorite | Level 14

Nicely done, @ChrisNZ.

 

Jim

ChrisNZ
Tourmaline | Level 20

Or if you want to mix both logics, something like this:

data WANT;
  infile "&wdir\t.txt";
  input @;
  if _INFILE_ = '==[Summary Statistics by Model]===' then FOUND+1;
  if _INFILE_ =:'N.Obs. ' & FOUND;
  input @1 STAT $ NEW1 NEW2 NEW3 OLD1 OLD2 OLD3; output;
  input @1 STAT $ NEW1 NEW2 NEW3 OLD1 OLD2 OLD3; output;
  input @1 STAT $ NEW1 NEW2 NEW3 OLD1 OLD2 OLD3; output;
  stop; 
run;

 

 

mkeintz
PROC Star

This is a case where use of the @ "string" feature of the INPUT statement is just the right tool:

 

data want;
  input @'N.Obs.'   n1-n6
        @'Mean'     mu1-mu6
        @'Std.Dev.' std1-std6 ;
datalines;
===[Summary Statistics by Model]===

The three-parameter logistic (3PL) model:
                       New Form                     Old Form
             --------------------------     --------------------------
               a_j       b_j       c_j        a_j       b_j       c_j
N.Obs.           10        10        10         10        10        10
Mean         1.0000   -0.5549    2.0000     2.0000   -0.8378    0.4848
Std.Dev.     0.8737    0.7388    0.4899     1.0000    0.4838    1.9849
run;

 

It doesn't matter how far down the "N.Obs." line is.  All of them will be skipped by the input statement until the "N.Obs." is encountered.

 

As long as you always have three lines starting with "N.Obs.", "Mean", and "Std.Dev.", respectively, and they are in that order (though not necessarily consecutive), and you always have 6 data columns in each, this works just fine.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 665 views
  • 4 likes
  • 5 in conversation