topic Re: Stuck trying to import/input an oddly delimited text file. INPUT or parse post-PROC IMPORT? in SAS Programming

Stuck trying to import/input an oddly delimited text file. INPUT or parse post-PROC IMPORT?

deltron — Thu, 01 Mar 2018 15:31:16 GMT

Hey all,

I'm a little stuck with how best to approach the input of a text file.

Here's what my data looks like -

HOUSEHOLD RECORD

DATA       SIZE   BEGIN RANGE

D HRECORD     1      1   (1:1)
U All households
V          1 .Household record

D H_SEQ       5      2 (00001:99999)
     Household sequence number
V All households
V     00001- .Household sequence number
V     99999 .

D HHPOS       2      7 (00:00)
   Trailer portion of unique household ID.
00 for HH record. Same function in
family record is field FFPOS (01-39).
Same function in person record is PPPOS
(41-79).

Following "D " is always the variable name ("HRECORD"), which I want to input. Following the variable is the length ("size" column) which I also wish to input. "U" designates the universe; I want to keep that string. "V" designates the values for that variable, which I also want bring in. A variation of this structure, is when the "D" row is followed by a variable description, (between D and U or V).

I'm not exactly sure how to import/input this text file considering it's data structure. A PROC IMPORT will bring everything in, but into just one column and some descriptions are truncated/cut off. I've thought about using character functions to start parsing the data.

The goal is to have a data set with the following columns: Var_Name, Length, Description, Universe, and Values.

Any insight?

(SAS 9.4 TS Level 1M4, Enterprise Guide 7.1)

Re: Stuck trying to import/input an oddly delimited text file. INPUT or parse post-PROC IMPORT?

RW9 — Thu, 01 Mar 2018 15:52:35 GMT

There is no simple method I am afraid, simple methods are only available for standard file formats.

Read in the data using a datastep and infile. Input the text record as a whole, then post process it. E.g.

data want;
  infile "youdata.txt";
  input;
  if char(_input_,1)="D" and char(_input_,2)=" " then do;
    ...;
  end;
run;

Note, its not even a delimited file!

Re: Stuck trying to import/input an oddly delimited text file. INPUT or parse post-PROC IMPORT?

Tom — Thu, 01 Mar 2018 15:53:09 GMT

Are you asking how to read this data dictionary report into a metadata dataset?

Or are you asking how to use the information in this report to read the actual data file it is describing?

Neither look like they are "delimited". Both appear to be using fixed format.

The data dictionary you posted is just using different record structures on different lines based on what information is on those line.

You can read this using a data step pretty easily.

Re: Stuck trying to import/input an oddly delimited text file. INPUT or parse post-PROC IMPORT?

deltron — Thu, 01 Mar 2018 16:02:14 GMT

Yes, I'm asking how to specifically read parts of this data dictionary into a metadata dataset.

Re: Stuck trying to import/input an oddly delimited text file. INPUT or parse post-PROC IMPORT?

deltron — Thu, 01 Mar 2018 16:11:20 GMT

Thank you. I'll give this a try.

Re: Stuck trying to import/input an oddly delimited text file. INPUT or parse post-PROC IMPORT?

Tom — Thu, 01 Mar 2018 16:24:29 GMT

Looks like you have multiple datasets/files defined in this single document. There appear to be headers using **** between the files. It might just be easier to manually split the file into separate files for each one, but you could probably design a method to detect those also.

You have mainly D and V records and a few U records which it is not immediately obvious if they are useful for anything. The tricky part will be if you want to also read the multiple line descriptions that follow the D records and also some of the V records also appear to be split over multiple lines.

Try something like this to start and then see if you can either make it handle more variation, or just manually update the results with more details than you could pull with the program.

data variables (keep=name length start range label )
        values (keep=name vdescr vstart vlabel)
;
  length name $32 length start 8 range $20 label $256 ;
  length vdescr $256 vstart $32 vlabel $256 ; 
  retain NAME VDESCR;
  infile 'myfile.txt' truncover ;
  input @ ;
  if _infile_=:'D' then do ;
   input name start length range / label $256. ;
   output variables ;
   vdescr = ' ';
  end;
  if _infile_=:'V' then do;
    if _infile_^=: 'V   ' then input @2 vdescr $256. ;
    else do;
      vstart = left(substr(_infile_,2,12));
      vlabel = left(substr(_infile_,13));
      output values;
    end;
  end;
run;

Re: Stuck trying to import/input an oddly delimited text file. INPUT or parse post-PROC IMPORT?

deltron — Thu, 01 Mar 2018 18:10:43 GMT

I'll play around with this and see. Thank you for your time.