DATA Step, Macro, Functions and more

How to use an array while comparing data sets? #Advanced SAS

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 15
Accepted Solution

How to use an array while comparing data sets? #Advanced SAS

Hi All,

I have been facing a problem where I am trying to compare two datasets by passing array values for comparison to other data set:

The code that is giving me trouble is written below:

data web.new;

  array target_brand{135} $ 30 _temporary_;                                           /*****Defining an array target_brand*************/

IF _N_ = 1 then do i = 1 to 135;

set brands;

target_brand{i} = brand_names1;                                                       /*****Inserting values to array***********************/

put target_brand{i};

end;

set web.home2;                                                                               /*****Calling second data set where I have to compare array values************/

do i = 1 to 135;                                                                                  

bb = kindex(Keyword3,target_brand{i});                                                    

if bb ne 0 then do;

put i target_brand{i};

i = 140;

end;

end;

run;

ods html file = "test.xls";                                                     /*****Printing the output**********************/

proc print data = web.new;

run;

ods html close;

The image of the part of the output which is not giving me desired results is given below:

As you can see VIKING is present in the sec observation as well but the code is unable to capture it. Also my observation is that it captures only the last word in the Keyword3.

Untitled.png

Thanks in advance!


Accepted Solutions
Solution
‎09-16-2013 12:14 PM
Super Contributor
Posts: 339

Re: How to use an array while comparing data sets? #Advanced SAS

Your issue is extremely simple even though it took me like an hour to mumble and finally figure it out.

I'll drive your attention to one particular event for all of the working matches: VIKING is the last word in keyword3

SAS does not trim/compress strings unless specified otherwise. So what happens is, kindex function attempts to match "VIKING                           "

You can simply modify

bb = kindex(Keyword3,trim(target_brand{i})); 

and it should do the trick.

On a sidenote, you could also write, for readability,

do i = 1 to 135 until (coalesce(bb, 0) NE 0);                                                                                 

bb = kindex(Keyword3,target_brand{i});                                                   

end;

Coalesce is just to handle when bb is missing at the beginning of each new loop. It has the same logic as (bb NE 0 and bb NE .). Not sure what's most efficient.

Vince

View solution in original post


All Replies
Super Contributor
Posts: 297

Re: How to use an array while comparing data sets? #Advanced SAS

Can you please provide some example data for this scenario?

Occasional Contributor
Posts: 15

Re: How to use an array while comparing data sets? #Advanced SAS

Posted in reply to Scott_Mitchell

Ya sure Scott. PFA a sample csv files. home2.csv has 10 odd observation and one variable Keyword3 and brands.csv has two observations and one variable brand_names.  You have to adjust the "i" value in the loop according to the number of observation. Please let me know if you have any queries.

Thanks.

Super User
Posts: 19,787

Re: How to use an array while comparing data sets? #Advanced SAS

Can you post what your initial problem is?
Are you trying to identify words from one file in another for example?  Sample data is useful, in the post, actual example data, rather than a description of files.

Occasional Contributor
Posts: 15

Re: How to use an array while comparing data sets? #Advanced SAS

Yes Reeza, I am trying to find whether the word in the array is present in keyword3 (in another data set). If that is present then return me the position value. If the position returned is non zero then it comes out of the loop and reads next observation.

Thanks.

Super User
Posts: 19,787

Re: How to use an array while comparing data sets? #Advanced SAS

You data goes down (ie 1 variable) and in SAS an array goes across multiple columns, or a single row. I don't think this is the proper use of an array.

You could probably make it work, but it seems very inefficient.

Here's a sample:

proc sql noprint;

select name into :list_names separated by '", "'

from sashelp.class

    where age=13;

select count(*) into Smiley Surprisedbs_count from sashelp.class

    where age=13;

quit;

%put "&list_names";

data find_words;

    set sashelp.class;

    array search_words(&obs_count) $ _temporary_ ("&list_names");

    do i =1 to dim(search_words);

        if index( name, search_words(i))>0 then match=search_words(i);

    end;

run;

Super Contributor
Posts: 297

Re: How to use an array while comparing data sets? #Advanced SAS

Actually I believe that it is the most efficient method of lookup, but is quite memory intensive.

Check out the attached for more info.

http://www2.sas.com/proceedings/sugi26/p008-26.pdf

Super Contributor
Posts: 339

Re: How to use an array while comparing data sets? #Advanced SAS

Posted in reply to Scott_Mitchell

Scott, I SUGI 26 was in 2000 or 2001? That was before SAS developpers implemented hash objects. This paper essentially describes the hard coded version of hash hash objects back when you couldn't use the object already. The underlying concept of the hash object is that it builds a unique fast searchable key tied to a pointer on the data for that unique key. The object does the binary hashing discussed in the paper.

At this stage in computer technology evolution though, looping over 150 array cells versus performing a hash lookup on  a hash object with 150 cells is marginally better. He would need to tens of millions of keyword3 to notice any difference.

Nonetheless, a hash table approach to a similar problem has been posted elsewhere on these forums recently

Super Contributor
Posts: 297

Re: How to use an array while comparing data sets? #Advanced SAS

Posted in reply to Vince28_Statcan

Vince, I have read numerous articles that compare the key index method against hashes and the key index method still out performed the hash.

Super Contributor
Posts: 339

Re: How to use an array while comparing data sets? #Advanced SAS

Posted in reply to Scott_Mitchell

Guess I'll give it a fully detailed read then. It really just read to me like a hard coded hash. I don't think the gain would be worth the programming trouble for the projects I've been working on but I'm always interested in understanding that kind of stuff. Especially if it's transferable to other programming languages.

Thanks for the info. If you have any such recent paper comparing hash and index method, feel free to post the link back here Smiley Happy

Super Contributor
Posts: 297

Re: How to use an array while comparing data sets? #Advanced SAS

Posted in reply to Vince28_Statcan

My knowledge is based on reading articles, having not benchmarked the two approaches myself.  The attached article from 2007 simply states that temporary arrays are the fastest lookup method, without any data to back it up, but I have read several suggesting the same. http://www.nesug.info/Proceedings/nesug07/bb/bb16.pdf

I am only new to hash objects, so have spent the last wekk or so playing around with them.  I am fine with simple lookups and the like, but there is so much more to them than just that and the individual methods are doing my head in.  I do enjoy a challenge though, so hopefully will have them licked shortly and then can begin benchmarking.

Super User
Posts: 19,787

Re: How to use an array while comparing data sets? #Advanced SAS

Posted in reply to Scott_Mitchell

You're correct. Thanks.

The OP's code works in 0.01 seconds versus over 0.71 secs for my process. 

Super Contributor
Posts: 339

Re: How to use an array while comparing data sets? #Advanced SAS

Reeza, _TEMPORARY_ array do not reference particular columns. They're a mean to store values in memory with no special indexing (like that of hash tables) to search them. Arrays that are not temporary and are specified with variable ranges or name list are not temporary and act basically as an addition to the data vector. Oddly enough array _temporary_ don't get their values reinitiated to missing at each data loop. They remain in memory as hash objects until the end of the data step. That is, they are not really temporary to the iterations of the data step (but they are temporary to the output dataset as they are not automatically output since they are not tied to any data vector variables).

The above syntax for array _temporary_ is definitely right. I use temporary arrays as small dictionaries all the time.

As to the OP, there was a thread about 2 months back that depicted a different approach. That of reading your keyword3, word by word and then doing a match lookup to a hash table acting as a dictionary (likely more efficient than kindex or find functions). I'll try to find the thread back in my activity and link it if you want. But the TRIM() I mentionned above should make your current code work as intended

Super Contributor
Posts: 339

Re: How to use an array while comparing data sets? #Advanced SAS

Posted in reply to Vince28_Statcan

As mentioned above, here's a thread with some stress testing with examples using hash tables to achieve your desired result

https://communities.sas.com/thread/46818

Again though, with marginally small dictionary of good words, you won't see any difference with your current _temporary_ array approach.

Solution
‎09-16-2013 12:14 PM
Super Contributor
Posts: 339

Re: How to use an array while comparing data sets? #Advanced SAS

Your issue is extremely simple even though it took me like an hour to mumble and finally figure it out.

I'll drive your attention to one particular event for all of the working matches: VIKING is the last word in keyword3

SAS does not trim/compress strings unless specified otherwise. So what happens is, kindex function attempts to match "VIKING                           "

You can simply modify

bb = kindex(Keyword3,trim(target_brand{i})); 

and it should do the trick.

On a sidenote, you could also write, for readability,

do i = 1 to 135 until (coalesce(bb, 0) NE 0);                                                                                 

bb = kindex(Keyword3,target_brand{i});                                                   

end;

Coalesce is just to handle when bb is missing at the beginning of each new loop. It has the same logic as (bb NE 0 and bb NE .). Not sure what's most efficient.

Vince

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 20 replies
  • 584 views
  • 8 likes
  • 5 in conversation