BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
CHL0320
Obsidian | Level 7

Dear Community;

I need to read a text file that is a codebook to create an SAS dataset. This SAS dataset only contains 4 variables, var_name, Label, type, and Explanation. A short version of the text file is attached below (the number of variables is more than 300). The want dataset (shown in Excel, but .sas7bdat is desired) is also included. Thank you so much for your time.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Is your file representative of the actual file(s)?  It does appear pretty organized to me.

 

If all of the variables have all of the elements (and the case is consistent) then you could just use @'text' input style to jump to the location of the next metadata field.

 

So you might try something like this:

data codebook;
  infile 'c:\downloads\have.txt' lrecl=132 pad ;
  if _n_=1 then do;
    input @'Form name:' formname $80. 
          @'Form code:' formcode $10.
          @'SAS datafile name:' formsasname $10.
    ;
    retain form: ;
  end;
  varnum+1;
  input @'var name:' varname $32.
        @'Label:' label $100.
        @'type:' type $20.
        @'range:' range $30.
        @'length:' length $20.
        @'format:' format $32.
        @'missing code:' missingcode $50.
  ;
run;

Result

Tom_0-1712106058615.png

If some of the elements (like LABEL for example) is not there then you might need to work a little harder.  Basically read each line and figure out if it one of interest and then extract the value you want from it.  Retain the values and then detect when a variable's definition ends (looks like that is signaled by a line of hyphens) and output the record and clear the retained variables.

View solution in original post

8 REPLIES 8
data_null__
Jade | Level 19

This works but I had to use the ancient CAPS option, because I don't know how to get input @'String' input to ignore case. 

 

options caps=1;
data want;
   infile cards;

   input @'VAR NAME: '     var_name:$32.
         @'LABEL: '        label &$128.
         @'TYPE: '         type :$8.
         @'EXPLANATION: '  Explanation &$128.
         ;
   cards4;
Clinical Trial - Data Base Codebook
-----------------------------------------------------
Form name:QUESTIONNAIRE- PART B
Form code: BL
SAS datafile name: bu

NOTE:  First version - v40.0 - codebook

Versions:
v41.0 - Second version
        Created:
   Uploaded:


Codebook last updated:

---------------------------------------------------------------------------------
var name: HRBv1


Label: id

---------------------------------------------------------------------------------

                type:  numeric
                 range:  none
   length:
   format:
         missing code:

Explanation: a number given by the system.
Source: Automatically entered from computerized system
name of variable in Dump file:   id

-------------------------------------------------------------------------------
var name: HRBv2

Label: qs_checklist_name

-------------------------------------------------------------------------------

                  type:  string
      length: 15
      format: varchar

           missing code:


explanation: information about the visit: bl, number.
Source: Automatically entered from computerized system
name of variable in Dump file:   qs_checklist_name

-------------------------------------------------------------------------------
var name: HRBv3
Label: pdf_file_size


-------------------------------------------------------------------------------

                 type:  numeric
                 range:
      length:  20
      format: varchar
           missing code:

explanation: size of the pdf file in bytes.
Source: Automatically entered from computerized system
name of variable in Dump file:   pdf_file_size


-------------------------------------------------------------------------------
var name: HRBv4
Label: pdf_file_status

-------------------------------------------------------------------------------

                  type:  numeric (byte)

                 range:  [0,2]
      length:  1
      format: char
           missing code:   0


explanation: status of the pdf file: 0=blank, 1=draft, 2=final.
Source: Automatically entered from computerized system
name of variable in Dump file:   pdf_file_status
-------------------------------------------------------------------------------
var name: HRBv5
Label: user_pin
-------------------------------------------------------------------------------
type:  numeric (int)

                 range:
      length: 4
      format: integer
           missing code:

explanation: the user's pin number.
Source: Automatically entered from computerized system
name of variable in Dump file:   user_pin
;;;;
   run;

Capture.PNG

ballardw
Super User

Might explain to @CHL0320 that an INPUT statement using : Input @'some string value' moves the pointer in the input buffer to the next position where the value of 'some string value' exists and reads from there to get the value of the variable.

mkeintz
PROC Star

Do you need "option caps=1"?  It looks like the @ strings are consistent throughout the document.  So if you know the consistent case for each search string,  couldn't it be as below?

 

   input @'var name: '     var_name:$32.
         @'Label: '        label &$128.
         @'type: '         type :$8.
         @'Explanation: '  Explanation &$128.
         ;

Without resorting to the CAPS option?

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ballardw
Super User

@mkeintz 

Something to handle case was needed. In the posted example the first two sets have "explanation" and "Explanation" .

Which would make me extremely cautious about the remainder of the file and the key words.

 


@mkeintz wrote:

Do you need "option caps=1"?  It looks like the @ strings are consistent throughout the document.  So if you know the consistent case for each search string,  couldn't it be as below?

 

   input @'var name: '     var_name:$32.
         @'Label: '        label &$128.
         @'type: '         type :$8.
         @'Explanation: '  Explanation &$128.
         ;

Without resorting to the CAPS option?

 

 

 


 

Tom
Super User Tom
Super User

Is your file representative of the actual file(s)?  It does appear pretty organized to me.

 

If all of the variables have all of the elements (and the case is consistent) then you could just use @'text' input style to jump to the location of the next metadata field.

 

So you might try something like this:

data codebook;
  infile 'c:\downloads\have.txt' lrecl=132 pad ;
  if _n_=1 then do;
    input @'Form name:' formname $80. 
          @'Form code:' formcode $10.
          @'SAS datafile name:' formsasname $10.
    ;
    retain form: ;
  end;
  varnum+1;
  input @'var name:' varname $32.
        @'Label:' label $100.
        @'type:' type $20.
        @'range:' range $30.
        @'length:' length $20.
        @'format:' format $32.
        @'missing code:' missingcode $50.
  ;
run;

Result

Tom_0-1712106058615.png

If some of the elements (like LABEL for example) is not there then you might need to work a little harder.  Basically read each line and figure out if it one of interest and then extract the value you want from it.  Retain the values and then detect when a variable's definition ends (looks like that is signaled by a line of hyphens) and output the record and clear the retained variables.

CHL0320
Obsidian | Level 7

Thank you for all your help. There is a problem with the consistency in variable cases. For example, I need to have a variable "explanation" to be included but some of them are "Explantation".

ballardw
Super User

@CHL0320 wrote:

Thank you for all your help. There is a problem with the consistency in variable cases. For example, I need to have a variable "explanation" to be included but some of them are "Explantation".


That is what @data_null__ 's use of the system option CAPS handle. It translates all characters to upper case so the comparison is consistent.

 

Or talk to whoever prepares that file and fix why the case changes.

Tom
Super User Tom
Super User

@CHL0320 wrote:

Thank you for all your help. There is a problem with the consistency in variable cases. For example, I need to have a variable "explanation" to be included but some of them are "Explantation".


If you need to read those fields (why?) then you must go with the more complicate approach. 

 

Something like this that reads the tag text and cleans it up before deciding what metadata variable it means.

data codebook;
  infile 'c:\downloads\have.txt' dsd dlm=':' truncover end=eof;
  length formname $20 formcode $10
         varnum 8 varname $32
/* Define the other variables */
         explanation $200
  ;
  retain varnum 1 formname -- explanation;
  input tag :$200. @ ;
  tag=lowcase(compress(tag,'20A0090a0d'x));
  drop tag ;
  if index(_infile_,':') then select (tag);
    when ('varname')  do;
      if varname ne ' ' then do;
         output;
         varnum+1;
         call missing(of varname--explanation);
      end;
      input varname $200.;
    end;
    when ('formname') input formname $200.;
    when ('formcode') input formcode $200.;
    when ('explanation') input explanation $200.;
/* fill in the other possible variables */
    otherwise ;
  end;
  if eof and varname ne ' ' then output;
run;

Result

Tom_0-1712155703124.png

If it is possible that a metadata field value could span multiple lines you will need more than just the simple INPUT statement in the WHEN clause for that metadata field.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1753 views
  • 9 likes
  • 5 in conversation