DATA Step, Macro, Functions and more

PUZZLE - finding anagram groups

Reply
Super User
Posts: 9,687

PUZZLE - finding anagram groups

Since I saw FriedEgg post a question at this forum ,which stimulate me to post another puzzle.

It is from JAVA Tutorials ( HashMap ), which is very interesting problem.

I can code it by using SAS ,but I will not code it for the sake of my little spare time. And I believe someone here can give some answer.

I will not mark someone' answer be correct too, I have given the answer at following . DON'T CHEAT . Smiley Happy

What we should learned is algorithm from Java code.

There is a standard trick for finding anagram groups: For each word in the dictionary, alphabetize the letters in the word (that is, reorder the word's letters into alphabetical order) and put an entry into a multimap, mapping the alphabetized word to the original word. For example, the word bad causes an entry mapping abd into bad to be put into the multimap. A moment's reflection will show that all the words to which any given key maps form an anagram group. It's a simple matter to iterate over the keys in the multimap, printing out each anagram group that meets the size constraint.

The following program is a straightforward implementation of this technique.

import java.util.*;
import java.io.*;

public class Anagrams {
    public static void main(String[] args) {
        int minGroupSize = Integer.parseInt(args[1]);

        // Read words from file and put into a simulated multimap
        Map> m = new HashMap>();

        try {
            Scanner s = new Scanner(new File(args[0]));
            while (s.hasNext()) {
                String word = s.next();
                String alpha = alphabetize(word);
                List l = m.get(alpha);
                if (l == null)
                    m.put(alpha, l=new ArrayList());
                l.add(word);
            }
        } catch (IOException e) {
            System.err.println(e);
            System.exit(1);
        }

        // Print all permutation groups above size threshold
        for (List l : m.values())
            if (l.size() >= minGroupSize)
                System.out.println(l.size() + ": " + l);
    }

    private static String alphabetize(String s) {
        char[] a = s.toCharArray();
        Arrays.sort(a);
        return new String(a);
    }
}

Running this program on a 173,000-word dictionary file with a minimum anagram group size of eight produces the following output.

9: [estrin, inerts, insert, inters, niters, nitres, sinter,

  triens, trines]

8: [lapse, leaps, pales, peals, pleas, salep, sepal, spale]

8: [aspers, parses, passer, prases, repass, spares, sparse,

  spears]

10: [least, setal, slate, stale, steal, stela, taels, tales,

  teals, tesla]

8: [enters, nester, renest, rentes, resent, tenser, ternes,

  treens]

8: [arles, earls, lares, laser, lears, rales, reals, seral]

8: [earings, erasing, gainers, reagins, regains, reginas,

  searing, seringa]

8: [peris, piers, pries, prise, ripes, speir, spier, spire]

12: [apers, apres, asper, pares, parse, pears, prase, presa,

  rapes, reaps, spare, spear]

11: [alerts, alters, artels, estral, laster, ratels, salter,

  slater, staler, stelar, talers]

9: [capers, crapes, escarp, pacers, parsec, recaps, scrape,

  secpar, spacer]

9: [palest, palets, pastel, petals, plates, pleats, septal,

  staple, tepals]

9: [anestri, antsier, nastier, ratines, retains, retinas,

  retsina, stainer, stearin]

8: [ates, east, eats, etas, sate, seat, seta, teas]

8: [carets, cartes, caster, caters, crates, reacts, recast,

  traces]

Many of these words seem a bit bogus, but that's not the program's fault; they're in the dictionary file. Here's the dictionary file we used. It was derived from the Public Domain ENABLE benchmark reference word list.

Have a NICE day !

Ksharp

Attachment
Respected Advisor
Posts: 3,124

Re: PUZZLE - finding anagram groups

Hi Ksharp,

I am happy to announce that it is not that hard for SAS to tackle this one either. BTW, the dictionary file only has 80,368 words instead.

data have;

infile 'c:\temp\dictionary.txt' truncover;

input var :$100.;

run;

data want;

array st(8) $1. _temporary_;

length new_var $8;

set have;

do _n_=1 to lengthn(var);

st(_n_)=substr(compress(var),_n_,1);

end;

call sortc(of st(*));

new_var=cats(of st(*));

call missing(of st(*));

run;

proc sql;

select * from want group by new_var having count(var)>=9;

quit;

Regards,

Haikuo

Super User
Posts: 9,687

Re: PUZZLE - finding anagram groups

Bian Hai Kuo,

I am happy to announce that it is not that hard for SAS to tackle this one either.

Yeah, you got it.Acutally it is more easy for SAS than JAVA .That is the advantage of SAS at the field of DATA Processing .

Maybe you can code it like below  to make it more less typing .

data have;

infile 'c:\temp\dictionary.txt' truncover;

input ( var1 - var50) ( $1.) ;

........

call sortc(of varSmiley Happy.

..

run;

Ksharp

Respected Advisor
Posts: 4,654

Re: PUZZLE - finding anagram groups

Nice exercise! And SAS is pretty good at statistics too, compared to Java Smiley Happy.

I hope you will eventually find a good job doing SAS. Otherwise, that would be a waste of a great talent.

PG

PG
Super User
Posts: 9,687

Re: PUZZLE - finding anagram groups

Thank you. PG,

I wish SAS can develop a language such as Java (OOP language) to allow us to create some GUI like JavaEE or JavaFX . That would be perfect for SAS.

I am afraid I will let you disappoint . Nowadays, I am learning JavaEE, almost stop learning SAS, But I don't want quit, so it is the reason why I am still at this forum.

In the future, I hope I can master JavaEE and find a job about Java Programmer and make a living by Java.

Ksharp

Trusted Advisor
Posts: 1,300

Re: PUZZLE - finding anagram groups

Nice posting Ksharp.  Reminds me of the word puzzles I had posted a long time ago.

My concept is similar to Hai.kuo's but I solve in one data step.  This way is pretty memory intense since I also retain the string of words, which slows it down.  Removing that piece should increase processing speed.

filename in url 'http://docs.oracle.com/javase/tutorial/collections/interfaces/examples/dictionary.txt';

data _null_;

array letter[8] $ 1 _temporary_;

length word new_word $ 8 words $ 76 count 3;

call missing(words,count);

retain addrlong dim;

addrlong=addrlong(letter[1]);

dim=dim(letter);

declare hash ha();

ha.definekey('new_word');

ha.definedata('new_word','words','count');

ha.definedone();

do until(eof);

call missing(of letter

  • );
  • infile in end=eof truncover;

    input word $;

    call pokelong(strip(word),addrlong,dim);

    call sortc(of letter

  • );
  • new_word=strip(peekclong(addrlong,dim));

    rc=ha.check();

    if rc then ha.add(key:new_word,data:new_word,data:word,data:1);

    else do;

    ha.find();  

    ha.replace(key:new_word,data:new_word,data:catx(' ',words,word),data:count+1);

    end;

    end;

    ha.output(dataset:'blah(where=(count>7))');

    stop;

    run;

    proc print data=blah;

    var count words;

    run;

    Regular Contributor
    Posts: 222

    Re: PUZZLE - finding anagram groups

    I believe SAS web app should have more awesome  functionality combine with java language in the near future.

    Thanks for the interesting topic.

    Super User
    Posts: 9,687

    Re: PUZZLE - finding anagram groups

    SAS web app is a IDE tool  for Java from SAS ? If I remember correctly.

    Trusted Advisor
    Posts: 1,300

    Re: PUZZLE - finding anagram groups

    I think you are thinking of the SAS App Dev Studio application, it is basically a wrapper on top of the Eclipse IDE that has some add-on's specific to aiding in developing applications using the SAS Integration API's.

    Super User
    Posts: 9,687

    Re: PUZZLE - finding anagram groups

    Matt,

    Yeah, I remember that puzzle you posted before. And SQL is also a good way.

    I can't believe you find the origin URL of this question. Do you switch to learn Java now ?

    'http://docs.oracle.com/javase/tutorial/collections/interfaces/examples/dictionary.txt';


    Ksharp

    Trusted Advisor
    Posts: 1,300

    Re: PUZZLE - finding anagram groups

    KSharp, I have known JAVA a long time, longer than I have used SAS.  I am much more comfortable working in SAS these days, however.  Occasionally I have to mess with the J2EE applications that work with SAS Integration Technology APIs in JAVA.  The same could be done with JavaFX as well.  Through data-step java objects you could accomplish the inverse as well.

    Super User
    Posts: 9,687

    Re: PUZZLE - finding anagram groups

    "Through data-step java objects you could accomplish the inverse as well."

    Unfortunately, data-step java interface is only supported by several primitive type object ,such as int string double float ..........., he can't return all of immense object from JAVA like DATE ,SWING .............

    Ksharp

    Trusted Advisor
    Posts: 1,300

    Re: PUZZLE - finding anagram groups

    Ask a Question
    Discussion stats
    • 12 replies
    • 1205 views
    • 6 likes
    • 5 in conversation