Hello everyone. I have a dataset that contains up to X columns, each of which are a character column of length $50. Somewhere in the dataset is the word "start". I need to figure out where the FIRST occurance of the word effective is (scanning row by row, from "left to right" (column 1- column xxx).
Then I need the output of the program to be the Column name (or can be number in dataset and I can figure out how to grab the column name) and the row where the word 'effective' was found.
I actually have no idea how to do this. I was thinking I could do some kind of "do" loop over the columns, checking each column for the word "effective", and then when it was found for the first time flag the column name... However the last step of the loop (outputting the column name) is not something i've ever seen done.
Does anyone have any idea on how to do this? For an example of data and the output I would want please see the following sas datasteps.
data person;
input Col1 $ Col2 $ Col3 $ Col4 $ Col5 $ Col6 $;
datalines;
John Sales John Sales John Sales
John Sales John Sales John Sales
John Sales start Sales John
John Sales John Sales John Sales
John Sales John Sales John Sales
;
In this example the word start is in Col3 row 3, so I would want the output to simply be a dataset that looks like follows.
Data DesiredOutput;
input ColumnName $ RowNumber $;
datalines;
Col3 3
;
run;
If anyone has any ideas of questions on what I am looking for please let me know! I will continue trying to come up with something until anyone is able to respond. Thanks again in advance for everyones help!
WHICHC and VNAME function are the inefficient brutal way of doing it...
I think the following code might solve my problem (in a very strange way).
data person;
input Col1 $ Col2 $ Col3 $ Col4 $ Col5 $ Col6 $;
datalines;
John Sales John Sales John start
John Sales John Sales John q
John Sales start Sales John
John Sales John Sales John d
John Sales John Sales John d
;
data mightwork;
set person;
array chararray{*} Col1 -- Col6;
i=1;
do while (chararray(i)~="start" and i<=5);
put i;
i+1;
end;
newvar=chararray(i);
run;
I think the reason it works is as follows. it is a while loop, so it checks the first 5 columns and if it ever finds the word "start" it stop and the while condition is never entered into again. Therefore I on that row is exactly equal to the column number that the string was found in.
If the value is found in the LAST column, then the loop has a problem. however I am defining a variable equal to "newvar=chararray(i)". If it doesn't find the character in the first 5 fields, i will become 6 and the newvar will be equal to the last value found within the dataset. I can then just write a subsetting data query that will check this newvar field for the string I want and thus subset the data (like so).
data answer(rename=(i=columnNumber));
set mightwork;
rownumber=_n_;
if newvar='start';
run;
This is a 'solution' but not one i'm particurally fond of due to how slow it is. Does anyone have any improvements over this solution?
Reeza thanks I've actually never heard of either of those functions I will look into them!
No idea on efficiency here, but you can test that and let me know
data person;
infile cards truncover;
input Col1 $ Col2 $ Col3 $ Col4 $ Col5 $ Col6 $;
datalines;
John Sales John Sales John start
John Sales John Sales John q
John Sales start Sales John
John Sales John Sales John d
John Sales John Sales John d
;
run;
data want;
set person;
array col(6) col1-col6;
found=whichc('start', of col1-col6);
if found>1 then do;
Row=_n_;
Variable=vname(col(found));
output;
end;
if found>1 then stop;
keep row variable;
run;
proc print data=want;
run;
IML version code. Just for fun :
data person; infile cards truncover; input Col1 $ Col2 $ Col3 $ Col4 $ Col5 $ Col6 $; datalines; John Sales John Sales John start John Sales John Sales John q John Sales start Sales John John Sales John Sales John d John Sales John Sales John d ; run; proc iml; use person; read all var _char_ into x; close person; s=ndx2sub(nrow(x)||ncol(x),loc(x='start'))[1,]; create want from s[colname={row column}]; append from s; close want; quit;
Xia Keshan
Does proc IML work with only SAS/IML ? Or can this be used in base SAS too?
Only in SAS/IML . If you have SAS University Edition (it is free) , you can run it .
How about PROC TRANSPOSE?
data person;
input Col1 $ Col2 $ Col3 $ Col4 $ Col5 $ Col6 $;
SeqNo = _n_;
datalines;
John Sales John Sales John start
John Sales John Sales John q
John Sales start Sales John
John Sales John Sales John d
John Sales John Sales John d
;
proc transpose;
var Col1-Col6;
by SeqNo;
run;
data want;
set _last_;
if Col1 = "start";
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.