Code to check dataset for specific value, and output Column and Row wh...

Anotherdream · Posted 04-22-2015 06:14 PM

Hello everyone. I have a dataset that contains up to X columns, each of which are a character column of length $50. Somewhere in the dataset is the word "start". I need to figure out where the FIRST occurance of the word effective is (scanning row by row, from "left to right" (column 1- column xxx).

Then I need the output of the program to be the Column name (or can be number in dataset and I can figure out how to grab the column name) and the row where the word 'effective' was found.

I actually have no idea how to do this. I was thinking I could do some kind of "do" loop over the columns, checking each column for the word "effective", and then when it was found for the first time flag the column name... However the last step of the loop (outputting the column name) is not something i've ever seen done.

Does anyone have any idea on how to do this? For an example of data and the output I would want please see the following sas datasteps.

data person;

input Col1 $ Col2 $ Col3 $ Col4 $ Col5 $ Col6 $;

datalines;

John Sales John Sales John Sales

John Sales start Sales John

John Sales John Sales John Sales

;

In this example the word start is in Col3 row 3, so I would want the output to simply be a dataset that looks like follows.

Data DesiredOutput;

input ColumnName $ RowNumber $;

datalines;

Col3 3

;

run;

If anyone has any ideas of questions on what I am looking for please let me know! I will continue trying to come up with something until anyone is able to respond. Thanks again in advance for everyones help!

Reeza · Posted 04-22-2015 06:22 PM

WHICHC and VNAME function are the inefficient brutal way of doing it...

Anotherdream · Posted 04-22-2015 06:30 PM

I think the following code might solve my problem (in a very strange way).

data person;
input Col1 $ Col2 $ Col3 $ Col4 $ Col5 $ Col6 $;
datalines;
John Sales John Sales John start
John Sales John Sales John q
John Sales start Sales John
John Sales John Sales John d
John Sales John Sales John d
;

data mightwork;
set person;
array chararray{*} Col1 -- Col6;
i=1;
do while (chararray(i)~="start" and i<=5);
put i;
i+1;
end;
newvar=chararray(i);
run;

I think the reason it works is as follows. it is a while loop, so it checks the first 5 columns and if it ever finds the word "start" it stop and the while condition is never entered into again. Therefore I on that row is exactly equal to the column number that the string was found in.

If the value is found in the LAST column, then the loop has a problem. however I am defining a variable equal to "newvar=chararray(i)". If it doesn't find the character in the first 5 fields, i will become 6 and the newvar will be equal to the last value found within the dataset. I can then just write a subsetting data query that will check this newvar field for the string I want and thus subset the data (like so).

data answer(rename=(i=columnNumber));

set mightwork;

rownumber=_n_;

if newvar='start';

run;

This is a 'solution' but not one i'm particurally fond of due to how slow it is. Does anyone have any improvements over this solution?

Reeza thanks I've actually never heard of either of those functions I will look into them!

Reeza · Posted 04-22-2015 06:44 PM

No idea on efficiency here, but you can test that and let me know

data person;

infile cards truncover;

input Col1 $ Col2 $ Col3 $ Col4 $ Col5 $ Col6 $;

datalines;

John Sales John Sales John start

John Sales John Sales John q

John Sales start Sales John

John Sales John Sales John d

;

run;

data want;

set person;

array col(6) col1-col6;

found=whichc('start', of col1-col6);

if found>1 then do;

Row=_n_;

Variable=vname(col(found));

output;

end;

if found>1 then stop;

keep row variable;

run;

proc print data=want;

run;

Ksharp · Posted 04-23-2015 09:13 AM

IML version code. Just for fun :

data person;
infile cards truncover;
   input Col1 $ Col2 $  Col3 $  Col4 $  Col5 $  Col6 $;
   datalines;
John Sales John Sales John start
John Sales John Sales John q
John Sales start Sales John
John Sales John Sales John d
John Sales John Sales John d
;
run;


proc iml;
use person;
read all var _char_ into x;
close person;
s=ndx2sub(nrow(x)||ncol(x),loc(x='start'))[1,];
create want from s[colname={row column}];
append from s;
close want;
quit;

Xia Keshan

PoornimaRavishankar · Posted 04-23-2015 10:52 AM

Does proc IML work with only SAS/IML ? Or can this be used in base SAS too?

Ksharp · Posted 04-24-2015 09:20 AM

Only in SAS/IML . If you have SAS University Edition (it is free) , you can run it .

TomKari · Posted 04-23-2015 02:52 PM

How about PROC TRANSPOSE?

data person;

input Col1 $ Col2 $ Col3 $ Col4 $ Col5 $ Col6 $;

SeqNo = _n_;

datalines;

John Sales John Sales John start

John Sales John Sales John q

John Sales start Sales John

John Sales John Sales John d

;

proc transpose;

var Col1-Col6;

by SeqNo;

run;

data want;

set _last_;

if Col1 = "start";

run;

Code to check dataset for specific value, and output Column and Row where it occurs

Re: Code to check dataset for specific value, and output Column and Row where it occurs

Re: Code to check dataset for specific value, and output Column and Row where it occurs

Re: Code to check dataset for specific value, and output Column and Row where it occurs

Re: Code to check dataset for specific value, and output Column and Row where it occurs

Re: Code to check dataset for specific value, and output Column and Row where it occurs

Re: Code to check dataset for specific value, and output Column and Row where it occurs

Re: Code to check dataset for specific value, and output Column and Row where it occurs

Classroom Training Available!