DATA Step, Macro, Functions and more

Stuck trying to import/input an oddly delimited text file. INPUT or parse post-PROC IMPORT?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

Stuck trying to import/input an oddly delimited text file. INPUT or parse post-PROC IMPORT?

Hey all,

I'm a little stuck with how best to approach the input of a text file.

Here's what my data looks like -

 

HOUSEHOLD RECORD

DATA       SIZE   BEGIN RANGE                  

D HRECORD     1      1   (1:1)
U All households
V          1 .Household record

D H_SEQ       5      2  (00001:99999)
     Household sequence number
V All households
V     00001- .Household sequence number
V     99999  .

D HHPOS       2      7  (00:00)
    Trailer portion of unique household ID.
00 for HH record. Same function in
family record is field FFPOS (01-39).
Same function in person record is PPPOS
(41-79).

 

Following "D " is always the variable name ("HRECORD"), which I want to input. Following the variable is the length ("size" column) which I also wish to input. "U" designates the universe; I want to keep that string. "V" designates the values for that variable, which I also want bring in. A variation of this structure, is when the "D" row is followed by a variable description, (between D and U or V).


I'm not exactly sure how to import/input this text file considering it's data structure. A PROC IMPORT will bring everything in, but into just one column and some descriptions are truncated/cut off. I've thought about using character functions to start parsing the data.

 

The goal is to have a data set with the following columns: Var_Name, Length, Description, Universe, and Values.

 

Any insight?

 

(SAS 9.4 TS Level 1M4, Enterprise Guide 7.1)


Accepted Solutions
Solution
‎03-07-2018 07:49 AM
Super User
Super User
Posts: 8,264

Re: Stuck trying to import/input an oddly delimited text file. INPUT or parse post-PROC IMPORT?

[ Edited ]

Looks like you have multiple datasets/files defined in this single document.  There appear to be headers using **** between the files. It might just be easier to manually split the file into separate files for each one, but you could probably design a method to detect those also.

 

You have mainly D and V records and a few U records which it is not immediately obvious if they are useful for anything. The tricky part will be if you want to also read the multiple line descriptions that follow the D records and also some of the V records also appear to be split over multiple lines.

 

Try something like this to start and then see if you can either make it handle more variation, or just manually update the results with more details than you could pull with the program.

 

data variables (keep=name length start range label )
        values (keep=name vdescr vstart vlabel)
;
  length name $32 length start 8 range $20 label $256 ;
  length vdescr $256 vstart $32 vlabel $256 ; 
  retain NAME VDESCR;
  infile 'myfile.txt' truncover ;
  input @ ;
  if _infile_=:'D' then do ;
   input name start length range / label $256. ;
   output variables ;
vdescr = ' '; end; if _infile_=:'V' then do; if _infile_^=: 'V ' then input @2 vdescr $256. ; else do; vstart = left(substr(_infile_,2,12)); vlabel = left(substr(_infile_,13)); output values; end; end; run;

 

View solution in original post


All Replies
Super User
Super User
Posts: 9,799

Re: Stuck trying to import/input an oddly delimited text file. INPUT or parse post-PROC IMPORT?

There is no simple method I am afraid, simple methods are only available for standard file formats.

Read in the data using a datastep and infile.  Input the text record as a whole, then post process it.  E.g.

data want;
  infile "youdata.txt";
  input;
  if char(_input_,1)="D" and char(_input_,2)=" " then do;
    ...;
  end;
run;

Note, its not even a delimited file!

Occasional Contributor
Posts: 6

Re: Stuck trying to import/input an oddly delimited text file. INPUT or parse post-PROC IMPORT?

Thank you. I'll give this a try.

Super User
Super User
Posts: 8,264

Re: Stuck trying to import/input an oddly delimited text file. INPUT or parse post-PROC IMPORT?

Are you asking how to read this data dictionary report into a metadata dataset?

Or are you asking how to use the information in this report to read the actual data file it is describing?

Neither look like they are "delimited".  Both appear to be using fixed format.

The data dictionary you posted is just using different record structures on different lines based on what information is on those line.

 

You can read this using a data step pretty easily.  

Occasional Contributor
Posts: 6

Re: Stuck trying to import/input an oddly delimited text file. INPUT or parse post-PROC IMPORT?

Yes, I'm asking how to specifically read parts of this data dictionary into a metadata dataset.

Solution
‎03-07-2018 07:49 AM
Super User
Super User
Posts: 8,264

Re: Stuck trying to import/input an oddly delimited text file. INPUT or parse post-PROC IMPORT?

[ Edited ]

Looks like you have multiple datasets/files defined in this single document.  There appear to be headers using **** between the files. It might just be easier to manually split the file into separate files for each one, but you could probably design a method to detect those also.

 

You have mainly D and V records and a few U records which it is not immediately obvious if they are useful for anything. The tricky part will be if you want to also read the multiple line descriptions that follow the D records and also some of the V records also appear to be split over multiple lines.

 

Try something like this to start and then see if you can either make it handle more variation, or just manually update the results with more details than you could pull with the program.

 

data variables (keep=name length start range label )
        values (keep=name vdescr vstart vlabel)
;
  length name $32 length start 8 range $20 label $256 ;
  length vdescr $256 vstart $32 vlabel $256 ; 
  retain NAME VDESCR;
  infile 'myfile.txt' truncover ;
  input @ ;
  if _infile_=:'D' then do ;
   input name start length range / label $256. ;
   output variables ;
vdescr = ' '; end; if _infile_=:'V' then do; if _infile_^=: 'V ' then input @2 vdescr $256. ; else do; vstart = left(substr(_infile_,2,12)); vlabel = left(substr(_infile_,13)); output values; end; end; run;

 

Occasional Contributor
Posts: 6

Re: Stuck trying to import/input an oddly delimited text file. INPUT or parse post-PROC IMPORT?

I'll play around with this and see. Thank you for your time.
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 171 views
  • 1 like
  • 3 in conversation