BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
deltron
Fluorite | Level 6

Hey all,

I'm a little stuck with how best to approach the input of a text file.

Here's what my data looks like -

 

HOUSEHOLD RECORD

DATA       SIZE   BEGIN RANGE                  

D HRECORD     1      1   (1:1)
U All households
V          1 .Household record

D H_SEQ       5      2  (00001:99999)
     Household sequence number
V All households
V     00001- .Household sequence number
V     99999  .

D HHPOS       2      7  (00:00)
    Trailer portion of unique household ID.
00 for HH record. Same function in
family record is field FFPOS (01-39).
Same function in person record is PPPOS
(41-79).

 

Following "D " is always the variable name ("HRECORD"), which I want to input. Following the variable is the length ("size" column) which I also wish to input. "U" designates the universe; I want to keep that string. "V" designates the values for that variable, which I also want bring in. A variation of this structure, is when the "D" row is followed by a variable description, (between D and U or V).


I'm not exactly sure how to import/input this text file considering it's data structure. A PROC IMPORT will bring everything in, but into just one column and some descriptions are truncated/cut off. I've thought about using character functions to start parsing the data.

 

The goal is to have a data set with the following columns: Var_Name, Length, Description, Universe, and Values.

 

Any insight?

 

(SAS 9.4 TS Level 1M4, Enterprise Guide 7.1)

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Looks like you have multiple datasets/files defined in this single document.  There appear to be headers using **** between the files. It might just be easier to manually split the file into separate files for each one, but you could probably design a method to detect those also.

 

You have mainly D and V records and a few U records which it is not immediately obvious if they are useful for anything. The tricky part will be if you want to also read the multiple line descriptions that follow the D records and also some of the V records also appear to be split over multiple lines.

 

Try something like this to start and then see if you can either make it handle more variation, or just manually update the results with more details than you could pull with the program.

 

data variables (keep=name length start range label )
        values (keep=name vdescr vstart vlabel)
;
  length name $32 length start 8 range $20 label $256 ;
  length vdescr $256 vstart $32 vlabel $256 ; 
  retain NAME VDESCR;
  infile 'myfile.txt' truncover ;
  input @ ;
  if _infile_=:'D' then do ;
   input name start length range / label $256. ;
   output variables ;
vdescr = ' '; end; if _infile_=:'V' then do; if _infile_^=: 'V ' then input @2 vdescr $256. ; else do; vstart = left(substr(_infile_,2,12)); vlabel = left(substr(_infile_,13)); output values; end; end; run;

 

View solution in original post

6 REPLIES 6
RW9
Diamond | Level 26 RW9
Diamond | Level 26

There is no simple method I am afraid, simple methods are only available for standard file formats.

Read in the data using a datastep and infile.  Input the text record as a whole, then post process it.  E.g.

data want;
  infile "youdata.txt";
  input;
  if char(_input_,1)="D" and char(_input_,2)=" " then do;
    ...;
  end;
run;

Note, its not even a delimited file!

Tom
Super User Tom
Super User

Are you asking how to read this data dictionary report into a metadata dataset?

Or are you asking how to use the information in this report to read the actual data file it is describing?

Neither look like they are "delimited".  Both appear to be using fixed format.

The data dictionary you posted is just using different record structures on different lines based on what information is on those line.

 

You can read this using a data step pretty easily.  

deltron
Fluorite | Level 6

Yes, I'm asking how to specifically read parts of this data dictionary into a metadata dataset.

Tom
Super User Tom
Super User

Looks like you have multiple datasets/files defined in this single document.  There appear to be headers using **** between the files. It might just be easier to manually split the file into separate files for each one, but you could probably design a method to detect those also.

 

You have mainly D and V records and a few U records which it is not immediately obvious if they are useful for anything. The tricky part will be if you want to also read the multiple line descriptions that follow the D records and also some of the V records also appear to be split over multiple lines.

 

Try something like this to start and then see if you can either make it handle more variation, or just manually update the results with more details than you could pull with the program.

 

data variables (keep=name length start range label )
        values (keep=name vdescr vstart vlabel)
;
  length name $32 length start 8 range $20 label $256 ;
  length vdescr $256 vstart $32 vlabel $256 ; 
  retain NAME VDESCR;
  infile 'myfile.txt' truncover ;
  input @ ;
  if _infile_=:'D' then do ;
   input name start length range / label $256. ;
   output variables ;
vdescr = ' '; end; if _infile_=:'V' then do; if _infile_^=: 'V ' then input @2 vdescr $256. ; else do; vstart = left(substr(_infile_,2,12)); vlabel = left(substr(_infile_,13)); output values; end; end; run;

 

deltron
Fluorite | Level 6
I'll play around with this and see. Thank you for your time.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2460 views
  • 1 like
  • 3 in conversation