Hi All,
I need to know how to identify a particular string and select all the strings after that and pass it as column names.
For example, I have the below data:
To be considered: |
XYZ |
STU |
LMN |
Random Data |
I want to identify the string "To be Considered:" and select all the row values after the string and make them variable names followed by _1 and _2.
My output columns should be like this:
XYZ_1 | XYZ_2 | STU_1 | STU_2 | LMN_1 | LMN_2 |
Please help.
Pass what values as "column names" where? To what process?
What does the "random data" look like afterward? How doe we know when the "column names" end and "random data starts, as in a rule that will always identify such? Since the column to hold text such as "To be considered" all the values in that "column" will be character.
Provide an actual example of your current data in the form of a data step and an example data step of what you expect the result to be. Since you don't show any values for the variables I suspect you actually aren't completely describing your problem, process or desired output. SAS variables without any values for the variables are pretty odd. So what is going to be done with the resulting data?
Hi @ballardw,
Thanks a lot for your reply.
Actually, I have attached my data, and it's a .txt file.
The Random data lines contain headings of the data and all the other text, which can be ignored.
The text after "includes.." and below "To be considered:" till a blank (as in the example above) should be the columns headers and I want 2 columns for each. I want this to assign values from the main data, which starts with a number, for example, 123456789123, and the fields under that number belong to it with variables same as the column headers we have mentioned above.
So, I want my output to look like this:
Column1 | ABC_1 | ABC_2 | DEF_1 | DEF_2 | GHI_1 | GHI_2 | JKL_1 | JKL_2 | XYZ_1 | XYZ_2 | STU_1 | STU_2 | LMN_1 | LMN_2 |
123456789123 | 12 | 34 | 56 | 37 |
|
| 64 | 73 |
|
|
|
|
|
|
23455122467 | 67 | 43 |
|
| 12 | 46 | 87 | 56 | 45 | 78 |
|
|
|
|
5674512344 | 78 | 45 | 12 | 54 |
|
|
|
|
|
| 32 | 82 | 45 | 36 |
I hope I have answered all your questions. Please help.
@SASKiwi @PaigeMiller: For some reason, my previous post got deleted. This is an example of what my data looks like and what I want. Any help will be greatly appreciated.
So you are trying to figure out how to read a textual report?
It is a pretty simple task in SAS. But you need to know the format of the file to be able to programmatically figure where you are while reading it.
So it looks to me like the data is organized under an ID number.
Is it safe to assume that any line that starts with a number like that the start of new group of values?
First let's share your sample text file in a form that anyone can run on their version of SAS.
So using PARMCARDS statement, PARMCARDS option and FILENAME statement you can make sample file in your WORK directory.
options parmcards=txt ;
filename txt temp;
parmcards4;
Random Data
Random Data
Random Data
Include... ABC
Include... DEF
Include... GHI
Include... JKL
123456789123
ABC .. .. 12 34
DEF .. .. 56 37
JKL .. .. 64 73
23455122467
ABC .. .. 67 43
GHI .. .. 12 46
JKL .. .. 87 56
XYZ .. .. 45 78
5674512344
ABC .. .. 78 45
DEF .. .. 12 54
STU .. .. 32 82
LMN .. .. 45 36
To be considered:
ABC
XYZ
STU
LMN
Random Data
;;;;
Now let's write a data step to read through file and pull out two thing.
One is the list of "vars" to INCLUDE from that header text.
The other is the actual data in a simplified ROW*COL format.
data consider (keep=varname)
results (keep=id row varname col value)
;
infile txt truncover length=ll column=cc ;
input @1 dummy ?? @;
if not missing(dummy) then do;
id = dummy;
retain id;
input / dummy ?? @@ ;
do row=1 by 1 while( missing(dummy));
input @1 varname :$32. @ ;
do col=1 by 1 while(cc<= ll );
input value ?? @;
if not missing(value) then output results;
end;
input / dummy ?? @@;
end;
end;
else if _infile_=:'Include' then do;
input varname ;
output consider;
end;
run;
You will probably need to tweak the logic as you figure out more about how the report is structured.
Results:
The list of variables to "include" or "consider".
Obs varname 1 ABC 2 DEF 3 GHI 4 JKL
The actual data:
Obs id row varname col value 1 123456789123 1 ABC 3 12 2 123456789123 1 ABC 4 34 3 123456789123 2 DEF 3 56 4 123456789123 2 DEF 4 37 5 123456789123 3 JKL 3 64 6 123456789123 3 JKL 4 73 7 23455122467 1 ABC 3 67 8 23455122467 1 ABC 4 43 9 23455122467 2 GHI 3 12 10 23455122467 2 GHI 4 46 11 23455122467 3 JKL 3 87 12 23455122467 3 JKL 4 56 13 23455122467 4 XYZ 3 45 14 23455122467 4 XYZ 4 78 15 5674512344 1 ABC 3 78 16 5674512344 1 ABC 4 45 17 5674512344 2 DEF 3 12 18 5674512344 2 DEF 4 54 19 5674512344 3 STU 3 32 20 5674512344 3 STU 4 82 21 5674512344 4 LMN 3 45 22 5674512344 4 LMN 4 36
If you really need that unwieldy wide format (perhaps for exporting to EXCEL for someone to browse?) you can use PROC TRANSPOSE.
proc transpose data=results out=wide (drop=_name_);
by id notsorted;
id varname col;
var value;
run;
Thanks a lot for the reply.
Just a quick question: what does dummy ?? @ signify in the program.
It is part of an INPUT statement.
DUMMY is a variable that SAS will create as numeric since it has not been defined before. Since it is not included in either of the KEEP= dataset option lists it will not written to either dataset. So it is just a dummy variable used to test if the first word on the line is a number.
?? is input modifier that says don't issue any errors if the text being read is not a valid number.
Trailing @ means to stay on the current line for this iteration of the data step.
Trailing double @ means to stay on he current line even across iterations of the data step.
How to define the length and column. In your case it's 11 and column=cc.
The LENGTH= and COLUMN= options on the INFILE statement define the NAMES of the variables that will contain the length of the current line from the file and location of the column pointer that determines where on that line SAS is currently set to next read.
In that program the names were LL (line length) and CC (current column).
Also, suppose if there is some string present in front of a number for example: view 00001 no #123456789123 then can we define dummy using a substring function to get the desired number that is 123456789123?
@UV_123 wrote:
Also, suppose if there is some string present in front of a number for example: view 00001 no #123456789123 then can we define dummy using a substring function to get the desired number that is 123456789123?
In that case you will need to modify the logic used to detect when a new ID number has appeared. The current method of just reading the first word on the line into a numeric variable will not work. But you could perhaps use the other method of comparing the start of the current line to a string literal that was also used in that example program to detect the lines you had that listed the variables to INCLUDE.
So perhaps something like:
if _infile_ =: 'view' then do;
input @ '#' id ;
end;
Hi @Tom ,
Sorry for the trouble. So, in this case, the code will look somewhat like this:
data consider (keep=varname)
results (keep=id row varname col value)
;
infile txt truncover length=ll column=cc ;
if _infile_ =: 'view' then do;
input @ '#' id ;
end;
/*if not missing('view') then do;
id = dummy;
retain id;
input / dummy ?? @@ ;*/
do row=1 by 1 while( missing('view'));
input @1 varname :$32. @ ;
do col=1 by 1 while(cc<= ll );
input value id @;
if not missing(value) then output results;
end;
input @ '#' id @@;
end;
end;
else if _infile_=:'Include' then do;
input varname ;
output consider;
end;
run;
I am not sure if I have done it correctly. Please do correct me. Thanks and sorry again.
Hi All,
I need help generating columns in SAS. Below is a sample of what my data looks like:
Random Data |
Random Data |
Random Data |
Include... ABC |
Include... DEF |
Include... GHI |
Include... JKL |
123456789123 |
ABC $..$ ^..^ 12 34 |
DEF $..$ ^..^ 56 37 |
JKL $..$ ^..^ 64 73 |
23455122467 |
ABC $..$ ^..^ 67 43 |
GHI $..$ ^..^ 12 46 |
JKL $..$ ^..^ 87 56 |
XYZ $..$ ^..^ 45 78 |
5674512344 |
ABC $..$ ^..^ 78 45 |
DEF $..$ ^..^ 12 54 |
STU $..$ ^..^ 32 82 |
LMN $..$ ^..^ 45 36 |
To be considered: |
XYZ |
STU |
LMN |
Random Data |
I want to generate the final result as below:
Column1 | ABC_1 | ABC_2 | DEF_1 | DEF_2 | GHI_1 | GHI_2 | JKL_1 | JKL_2 | XYZ_1 | XYZ_2 | STU_1 | STU_2 | LMN_1 | LMN_2 |
123456789123 | 12 | 34 | 56 | 37 | 64 | 73 | ||||||||
23455122467 | 67 | 43 | 12 | 46 | 87 | 56 | 45 | 78 | ||||||
5674512344 | 78 | 45 | 12 | 54 | 32 | 82 | 45 | 36 |
Any help will be greatly appreciated.
@SASKiwi @PaigeMiller : This is a more precise version of what I have and what I want. I appreciate your help.
@UV_123 - Looks like something happened to your previous post as it is completely blank. Can post your input data as DATA step DATALINES please otherwise we will have to spend time doing it. Is that the actual layout of your input data file as your data doesn't appear to be well organised? Please supply the actual layout of your input data file as DATALINES.
This looks like a two step program: (1) get id's, content identifiers (future variable names), and data values in a long format, (2) transpose to a wide format. What have you tried so far to read in the data - prior to transpose?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.