BookmarkSubscribeRSS Feed
Anotherdream
Quartz | Level 8

Hello everyone. I have a dataset that contains up to X columns, each of which are a character column of length $50. Somewhere in the dataset is the word "start".  I need to figure out where the FIRST occurance of the word effective is (scanning row by row, from "left to right" (column 1- column xxx).

Then I need the output of the program to be the Column name (or can be number in dataset and I can figure out how to grab the column name) and the row where the word 'effective' was found.

I actually have no idea how to do this. I was thinking I could do some kind of "do" loop over the columns, checking each column for the word "effective", and then when it was found for the first time flag the column name... However the last step of the loop (outputting the column name) is not something i've ever seen done.

Does anyone have any idea on how to do this?  For an example of data and the output I would want please see the following sas datasteps.

data person;

   input Col1 $ Col2 $  Col3 $  Col4 $  Col5 $  Col6 $;

   datalines;

John Sales John Sales John Sales

John Sales John Sales John Sales

John Sales start Sales John

John Sales John Sales John Sales

John Sales John Sales John Sales

;

In this example the word start is in Col3 row 3, so I would want the output to simply be a dataset that looks like follows.

Data DesiredOutput;

input ColumnName $ RowNumber $;

datalines;

Col3 3

;

run;

If anyone has any ideas of questions on what I am looking for please let me know! I will continue trying to come up with something until anyone is able to respond.  Thanks again in advance for everyones help!

7 REPLIES 7
Reeza
Super User

WHICHC and VNAME function are the inefficient brutal way of doing it...

Anotherdream
Quartz | Level 8

I think the following code might solve my problem (in a very strange way).

data person;
   input Col1 $ Col2 $  Col3 $  Col4 $  Col5 $  Col6 $;
   datalines;
John Sales John Sales John start
John Sales John Sales John q
John Sales start Sales John
John Sales John Sales John d
John Sales John Sales John d
;

data mightwork;
set person;
array chararray{*} Col1 -- Col6;
i=1;
do while (chararray(i)~="start" and i<=5);
put i;
i+1;
end;
newvar=chararray(i);
run;

I think the reason it works is as follows.  it is a while loop, so it checks the first 5 columns and if it ever finds the word "start" it stop and the while condition is never entered into again. Therefore I on that row is exactly equal to the column number that the string was found in.

If the value is found in the LAST column, then the loop has a problem.  however I am defining a variable equal to "newvar=chararray(i)".  If it doesn't find the character in the first 5 fields, i will become 6 and the newvar will be equal to the last value found within the dataset.  I can then just write a subsetting data query that will check this newvar field for the string I want and thus subset the data (like so).

data answer(rename=(i=columnNumber));

set mightwork;

rownumber=_n_;

if newvar='start';

run;

This is a 'solution' but not one i'm particurally fond of due to how slow it is. Does anyone have any improvements over this solution?

Reeza thanks I've actually never heard of either of those functions I will look into them!

Reeza
Super User

No idea on efficiency here, but you can test that and let me know Smiley Happy

data person;

infile cards truncover;

   input Col1 $ Col2 $  Col3 $  Col4 $  Col5 $  Col6 $;

   datalines;

John Sales John Sales John start

John Sales John Sales John q

John Sales start Sales John

John Sales John Sales John d

John Sales John Sales John d

;

run;

data want;

set person;

array col(6) col1-col6;

found=whichc('start', of col1-col6);

if found>1 then do;

Row=_n_;

Variable=vname(col(found));

output;

end;

if found>1 then stop;

keep row variable;

run;

proc print data=want;

run;

Ksharp
Super User

IML version code. Just for fun :

data person;
infile cards truncover;
   input Col1 $ Col2 $  Col3 $  Col4 $  Col5 $  Col6 $;
   datalines;
John Sales John Sales John start
John Sales John Sales John q
John Sales start Sales John
John Sales John Sales John d
John Sales John Sales John d
;
run;


proc iml;
use person;
read all var _char_ into x;
close person;
s=ndx2sub(nrow(x)||ncol(x),loc(x='start'))[1,];
create want from s[colname={row column}];
append from s;
close want;
quit;


Xia Keshan

PoornimaRavishankar
Quartz | Level 8

Does proc IML work with only SAS/IML ? Or can this be used in base SAS too?

Ksharp
Super User

Only in SAS/IML . If you have SAS University Edition (it is free) , you can run it .

TomKari
Onyx | Level 15

How about PROC TRANSPOSE?

data person;

   input Col1 $ Col2 $  Col3 $  Col4 $  Col5 $  Col6 $;

   SeqNo = _n_;

   datalines;

John Sales John Sales John start

John Sales John Sales John q

John Sales start Sales John

John Sales John Sales John d

John Sales John Sales John d

;

proc transpose;

var Col1-Col6;

by SeqNo;

run;

data want;

set _last_;

if Col1 = "start";

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 3263 views
  • 0 likes
  • 5 in conversation