BookmarkSubscribeRSS Feed
KDA
Calcite | Level 5 KDA
Calcite | Level 5
I have a dataset with 14000 obs and 200 variables.
There is a set of 54 character variables, named goname1-goname54, for which I want to set up an array and then search for a word/phrase within the variable values for goname1-goname54. Previously, I've successfully used both the array and the index functions, but I am unable to figure out how to do both in the same step.
For example, I want to create a new variable, called POT, that will be an index of whether any of the variables, goname1-goname54, contain the word 'potassium' (i.e., POT=1 if this word appears, and POT=0 if none of the variables contain this word.) In an attempt to do so, I've written the following program, but get all zeros for POT (and I know this is not correct):

data new;
set old;
array goname{1:54} $ 255 goname1-goname54;
flag = "potassium";
POT=0;
do i=1 to 54;
POT = index(goname{i}, flag);
end;
run;

I've also tried listing out all of the variable names (i.e., goname1 goname2 goname3...) after the array length statement, and also tried omitting the element length, among other attempts to manipulate the order of the program.
Any advice?
KDA
8 REPLIES 8
deleted_user
Not applicable
an excellent SUGI 29 paper (authors Paul Dorfman and someone else) refers to optimising this challenge.
In the paper at http://www2.sas.com/proceedings/sugi29/264-29.pdf entitled "A-P-P Advanced Data Management Functions" the impressive statement which addresses this challenge is[pre]
found = ^^ indexw (peekc (addr(a), 81 ), srchfor) ; * ^^ normalizes to std boolean ;[/pre]
Implemented for your challenge, it would look something like [pre]
data result_data ;
array goname{54} $ 255 ;
set old_wide_large_enough_data ;
srchfor = "potassium" ;
POT= ^^ indexw( peekc( addr( goname1 ), %eval(54*81) ), srchfor ) ;
run;[/pre]
The array is defined before the data is SET to ensure these variables are all together for the PEEKC() function.

Much more clarification can be found in that paper, including a pointer to the alternative to peekc() for 64bit platforms.

PeterC.
BPD
Obsidian | Level 7 BPD
Obsidian | Level 7
KDA,

You could just insert the line IF POT = 1 THEN LEAVE into the do loop. VIZ:

data new;
set old;
array goname{1:54} $ 255 goname1-goname54;
flag = "potassium";
POT=0;
do i=1 to 54;
POT = index(goname{i}, flag);
IF POT = 1 THEN LEAVE;
end;
run;

This stops POT subsequently being reset to 0 which may be all that's stopping the code working now.

Regards,

BPD
Patrick
Opal | Level 21
Hi KDA

I have no doubt that Peter and Paul's solution can't be beaten in regards of performance.
Peter: Thanks for posting this link. Very interesting.

I believe the code below would also work for the example given:

data new;
set old;
flag = "potassium";
pot= find(cats(of goname1-goname54),flag)>0;
run;


HTH
Patrick
Peter_C
Rhodochrosite | Level 12
Hi Patrick

>
> Peter: Thanks for posting this link. Very interesting.
>
> I believe the code below would also work for the example given:
>
> data new;
> set old;
> flag = "potassium";
> pot= find(cats(of goname1-goname54),flag)>0;

* issues like trailing blanks in FLAG, and rejecting substrings during the search, are addressed in SAS9.2 with the findW() function.
For SAS9 platforms where FINDW() is not available, the following work-around looks tedious ;
pot = find( '|'!! catx( '||', of goname1-goname54) !!'|', cats('|',flag,'|') )>0 ;

> run;
>

The enhanced functions in SAS9.2 are really making a difference.

regards
peterC
Ksharp
Super User
Hi. I think you should add
[pre]
retain flag;
[/pre]


Because flag variable is not come from dataset, so it will be set missing when data step enter the next iteration.


Ksharp
Cynthia_sas
SAS Super FREQ
Hi:
You are correct, that flag is initialized to MISSING for each iteration of the DATA step at the top of the program, however, flag is also assigned the value 'potassium' on each iteration of the DATA step program. So, the original code was OK. The problem was more likely one of the other issues noted.

It would only be better to use a retain for the FLAG variable if there was also a statement that assigned the value to FLAG only 1 time...something like:
[pre]
retain flag;
if _n_ = 1 then flag = 'potassium';
[/pre]

cynthia

ps...you can prove to yourself that potassium gets set on every iteration of the DATA step by using a test program that reads ANY data and uses similar logic:
[pre]
3097 data new;
3098 length flag $9;
3099 set sashelp.class;
3100 put 'before assignment statement: ' _n_= flag=;
3101
3102 flag = "potassium";
3103 put 'after assignment statement...' ;
3104 put _n_= name= flag=;
3105 run;

before assignment statement: _N_=1 flag=
after assignment statement...
_N_=1 Name=Alfred flag=potassium
before assignment statement: _N_=2 flag=
after assignment statement...
_N_=2 Name=Alice flag=potassium
before assignment statement: _N_=3 flag=
after assignment statement...
_N_=3 Name=Barbara flag=potassium
before assignment statement: _N_=4 flag=
after assignment statement...
_N_=4 Name=Carol flag=potassium
before assignment statement: _N_=5 flag=
after assignment statement...
_N_=5 Name=Henry flag=potassium
before assignment statement: _N_=6 flag=
after assignment statement...
_N_=6 Name=James flag=potassium
before assignment statement: _N_=7 flag=
after assignment statement...
_N_=7 Name=Jane flag=potassium
before assignment statement: _N_=8 flag=
after assignment statement...
_N_=8 Name=Janet flag=potassium
before assignment statement: _N_=9 flag=
after assignment statement...
_N_=9 Name=Jeffrey flag=potassium
before assignment statement: _N_=10 flag=
after assignment statement...
_N_=10 Name=John flag=potassium
before assignment statement: _N_=11 flag=
after assignment statement...
_N_=11 Name=Joyce flag=potassium
before assignment statement: _N_=12 flag=
after assignment statement...
_N_=12 Name=Judy flag=potassium
before assignment statement: _N_=13 flag=
after assignment statement...
_N_=13 Name=Louise flag=potassium
before assignment statement: _N_=14 flag=
after assignment statement...
_N_=14 Name=Mary flag=potassium
before assignment statement: _N_=15 flag=
after assignment statement...
_N_=15 Name=Philip flag=potassium
before assignment statement: _N_=16 flag=
after assignment statement...
_N_=16 Name=Robert flag=potassium
before assignment statement: _N_=17 flag=
after assignment statement...
_N_=17 Name=Ronald flag=potassium
before assignment statement: _N_=18 flag=
after assignment statement...
_N_=18 Name=Thomas flag=potassium
before assignment statement: _N_=19 flag=
after assignment statement...
_N_=19 Name=William flag=potassium
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.NEW has 19 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds

[/pre]
Ksharp
Super User
Hi.
You are right .I think i am too sensitive.
deleted_user
Not applicable
You might want to watch using some of string related functions like peekc & scan with array data if you are using particulary large sized array's i.e. Element size * element length.

Anything resulting in an overall array size of above 32k the functions end up producing unexpected results.

i.e. Try changing the number of the elements in the examples above to be 200 and you'll notice that the functions fail to work correctly

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1349 views
  • 0 likes
  • 7 in conversation