BookmarkSubscribeRSS Feed
UV_123
Calcite | Level 5

Hi All,

 

I need to know how to identify a particular string and select all the strings after that and pass it as column names. 

For example, I have the below data:

 

To be considered:
XYZ
STU
LMN
 
Random Data

 

I want to identify the string "To be Considered:" and select all the row values after the string and make them variable names followed by  _1 and _2. 

 

My output columns should be like this:

XYZ_1XYZ_2STU_1STU_2LMN_1LMN_2

 

Please help. 

 

 

 

20 REPLIES 20
ballardw
Super User

Pass what values as "column names" where? To what process?

What does the "random data" look like afterward? How doe we know when the "column names" end and "random data starts, as in a rule that will always identify such? Since the column to hold text such as "To be considered" all the values in that "column" will be character.

 

Provide an actual example of your current data in the form of a data step and an example data step of what you expect the result to be. Since you don't show any values for the variables I suspect you actually aren't completely describing your problem, process or desired output. SAS variables without any values for the variables are pretty odd. So what is going to be done with the resulting data?

UV_123
Calcite | Level 5

Hi @ballardw,

 

Thanks a lot for your reply. 

Actually, I have attached my data, and it's a .txt file. 

 

The Random data lines contain headings of the data and all the other text, which can be ignored. 

The text after "includes.." and below "To be considered:" till a blank (as in the example above) should be the columns headers and I want 2 columns for each. I want this to assign values from the main data, which starts with a number, for example, 123456789123, and the fields under that number belong to it with variables same as the column headers we have mentioned above. 

 

So, I want my output to look like this:

Column1

ABC_1

ABC_2

DEF_1

DEF_2

GHI_1

GHI_2

JKL_1

JKL_2

XYZ_1

XYZ_2

STU_1

STU_2

LMN_1

LMN_2

123456789123

12

34

56

37

 

 

64

73

 

 

 

 

 

 

23455122467

67

43

 

 

12

46

87

56

45

78

 

 

 

 

5674512344

78

45

12

54

 

 

 

 

 

 

32

82

45

36

 

I hope I have answered all your questions. Please help. 

 

@SASKiwi @PaigeMiller: For some reason, my previous post got deleted. This is an example of what my data looks like and what I want. Any help will be greatly appreciated. 

 

Tom
Super User Tom
Super User

So you are trying to figure out how to read a textual report?

 

It is a pretty simple task in SAS.  But you need to know the format of the file to be able to programmatically figure where you are while reading it.

 

So it looks to me like the data is organized under an ID number.  

Is it safe to assume that any line that starts with a number like that the start of new group of values?

 

First let's share your sample text file in a form that anyone can run on their version of SAS.

So using PARMCARDS statement, PARMCARDS option and FILENAME statement you can make sample file in your WORK directory.

 

Spoiler

 

options parmcards=txt ;
filename txt temp;
parmcards4;
Random Data
Random Data
Random Data
Include... ABC
Include... DEF
Include... GHI
Include... JKL

123456789123
ABC             ..   ..  12   34
DEF             ..   ..  56   37
JKL             ..   ..  64   73
23455122467
ABC             ..   ..  67   43
GHI             ..   ..  12   46
JKL             ..   ..  87   56
XYZ             ..   ..  45   78
5674512344
ABC             ..   ..  78   45
DEF             ..   ..  12   54
STU             ..   ..  32   82
LMN             ..   ..  45   36

To be considered:
ABC
XYZ
STU
LMN

Random Data
;;;;

Now let's write a data step to read through file and pull out two thing.

One is the list of "vars" to INCLUDE from that header text.

The other is the actual data in a simplified ROW*COL format.

data consider (keep=varname)
     results (keep=id row varname col value)
;
   infile txt truncover length=ll column=cc ;

   input @1 dummy ?? @;
   if not missing(dummy) then do;
      id = dummy;
      retain id;
      input / dummy ?? @@ ;
      do row=1 by 1 while( missing(dummy));
         input @1 varname :$32. @ ;
         do col=1 by 1 while(cc<= ll );
           input value ?? @;
           if not missing(value) then output results;
         end;
         input / dummy ?? @@;
      end;
   end;
   else if _infile_=:'Include' then do;
      input varname ;
      output consider;
   end;
run;

You will probably need to tweak the logic as you figure out more about how the report is structured.

 

Results:

The list of variables to "include" or "consider".

Obs    varname

 1       ABC
 2       DEF
 3       GHI
 4       JKL

The actual data:

Obs         id         row    varname    col    value

  1    123456789123     1       ABC       3       12
  2    123456789123     1       ABC       4       34
  3    123456789123     2       DEF       3       56
  4    123456789123     2       DEF       4       37
  5    123456789123     3       JKL       3       64
  6    123456789123     3       JKL       4       73
  7     23455122467     1       ABC       3       67
  8     23455122467     1       ABC       4       43
  9     23455122467     2       GHI       3       12
 10     23455122467     2       GHI       4       46
 11     23455122467     3       JKL       3       87
 12     23455122467     3       JKL       4       56
 13     23455122467     4       XYZ       3       45
 14     23455122467     4       XYZ       4       78
 15      5674512344     1       ABC       3       78
 16      5674512344     1       ABC       4       45
 17      5674512344     2       DEF       3       12
 18      5674512344     2       DEF       4       54
 19      5674512344     3       STU       3       32
 20      5674512344     3       STU       4       82
 21      5674512344     4       LMN       3       45
 22      5674512344     4       LMN       4       36

If you really need that unwieldy wide format (perhaps for exporting to EXCEL for someone to browse?) you can use PROC TRANSPOSE.

proc transpose data=results out=wide (drop=_name_);
  by id  notsorted;
  id varname col;
  var value;
run;

Tom_0-1654028646803.png

 

UV_123
Calcite | Level 5

Thanks a lot for the reply. 

 

Just a quick question: what does dummy ?? @ signify in the program. 

Tom
Super User Tom
Super User

It is part of an INPUT statement.

DUMMY is a variable that SAS will create as numeric since it has not been defined before.  Since it is not included in either of the KEEP= dataset option lists it will not written to either dataset. So it is just a dummy variable used to test if the first word on the line is a number.

 

?? is input modifier that says don't issue any errors if the text being read is not a valid number.

 

Trailing @ means to stay on the current line for this iteration of the data step.

Trailing double @ means to stay on he current line even across iterations of the data step.

 

UV_123
Calcite | Level 5

How to define the length and column. In your case it's 11 and column=cc. 

Tom
Super User Tom
Super User

The LENGTH= and COLUMN= options on the INFILE statement define the NAMES of the variables that will contain the length of the current line from the file and location of the column pointer that determines where on that line SAS is currently set to next read.

 

In that program the names were LL (line length) and CC (current column). 

UV_123
Calcite | Level 5

Also, suppose if there is some string present in front of a number for example: view 00001 no #123456789123 then can we define dummy using a substring function to get the desired number that is 123456789123?

Tom
Super User Tom
Super User

@UV_123 wrote:

Also, suppose if there is some string present in front of a number for example: view 00001 no #123456789123 then can we define dummy using a substring function to get the desired number that is 123456789123?


In that case you will need to modify the logic used to detect when a new ID number has appeared.  The current method of just reading the first word on the line into a numeric variable will not work.  But you could perhaps use the other method of comparing the start of the current line to a string literal that was also used in that example program to detect the lines you had that listed the variables to INCLUDE.

 

So perhaps something like:

if _infile_ =: 'view' then do;
   input @ '#' id ;
end;
UV_123
Calcite | Level 5

Hi @Tom ,

 

Sorry for the trouble. So, in this case, the code will look somewhat like this:

data consider (keep=varname)
     results (keep=id row varname col value)
;
   infile txt truncover length=ll column=cc ;

 if _infile_ =: 'view' then do;
   input @ '#' id ;
end;
   /*if not missing('view') then do;
      id = dummy;
      retain id;
      input / dummy ?? @@ ;*/
      do row=1 by 1 while( missing('view'));
         input @1 varname :$32. @ ;
         do col=1 by 1 while(cc<= ll );
           input value id @;
           if not missing(value) then output results;
         end;
         input @ '#' id @@;
      end;
   end;
   else if _infile_=:'Include' then do;
      input varname ;
      output consider;
   end;
run;

I am not sure if I have done it correctly. Please do correct me. Thanks and sorry again. 

UV_123
Calcite | Level 5

Hi All,

 

I need help generating columns in SAS. Below is a sample of what my data looks like:

 

Random Data
Random Data
Random Data
Include... ABC
Include... DEF
Include... GHI
Include... JKL
 
123456789123
ABC $..$ ^..^ 12 34
DEF $..$ ^..^ 56 37
JKL  $..$ ^..^ 64 73
23455122467
ABC $..$ ^..^ 67 43
GHI  $..$ ^..^ 12 46
JKL  $..$ ^..^ 87 56
XYZ $..$ ^..^ 45 78
5674512344
ABC $..$ ^..^ 78 45
DEF $..$ ^..^ 12 54
STU $..$ ^..^ 32 82
LMN $..$ ^..^ 45 36
 
To be considered:
XYZ
STU
LMN
 
Random Data

 

I want to generate the final result as below:

 

Column1ABC_1ABC_2DEF_1DEF_2GHI_1GHI_2JKL_1JKL_2XYZ_1XYZ_2STU_1STU_2LMN_1LMN_2
12345678912312345637  6473      
234551224676743  124687564578    
567451234478451254      32824536

 

Any help will be greatly appreciated. 

 

@SASKiwi  @PaigeMiller : This is a more precise version of what I have and what I want. I appreciate your help. 

SASKiwi
PROC Star

@UV_123  - Looks like something happened to your previous post as it is completely blank. Can post your input data as DATA step DATALINES please otherwise we will have to spend time doing it. Is that the actual layout of your input data file as your data doesn't appear to be well organised? Please supply the actual layout of your input data file as DATALINES.

UV_123
Calcite | Level 5

Hi @SASKiwi ,

 

Please find attached data for your ref. 

mkeintz
PROC Star

 

  • What is the significance of the "include" rows?  Should data for identifiers NOT in an "include" row be ignored?
  • Is every row prior to the first "include" to be ignored?
  • What is the significance of the "to be considered" row?
  • It appears that, for the "data" rows, you want only the rightmost two values, no matter how much content falls between the content identifier ("ABC", "DEF", etc) and those two values.  Is the correct?
  • Are the data values always numeric?
  • Are all the content identifiers three letters long?  If they can be longer, what is the maximum length (remember the ultimate variable names are limited to a length of 32 characters)?

 

This looks like a two step program: (1) get id's, content identifiers (future variable names), and data values in a long format, (2) transpose to a wide format.  What have you tried so far to read in the data - prior to transpose?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 20 replies
  • 2914 views
  • 1 like
  • 6 in conversation