BookmarkSubscribeRSS Feed
Ksharp
Super User

Since I saw FriedEgg post a question at this forum ,which stimulate me to post another puzzle.

It is from JAVA Tutorials ( HashMap ), which is very interesting problem.

I can code it by using SAS ,but I will not code it for the sake of my little spare time. And I believe someone here can give some answer.

I will not mark someone' answer be correct too, I have given the answer at following . DON'T CHEAT . Smiley Happy

What we should learned is algorithm from Java code.

There is a standard trick for finding anagram groups: For each word in the dictionary, alphabetize the letters in the word (that is, reorder the word's letters into alphabetical order) and put an entry into a multimap, mapping the alphabetized word to the original word. For example, the word bad causes an entry mapping abd into bad to be put into the multimap. A moment's reflection will show that all the words to which any given key maps form an anagram group. It's a simple matter to iterate over the keys in the multimap, printing out each anagram group that meets the size constraint.

The following program is a straightforward implementation of this technique.

import java.util.*;
import java.io.*;

public class Anagrams {
    public static void main(String[] args) {
        int minGroupSize = Integer.parseInt(args[1]);

        // Read words from file and put into a simulated multimap
        Map> m = new HashMap>();

        try {
            Scanner s = new Scanner(new File(args[0]));
            while (s.hasNext()) {
                String word = s.next();
                String alpha = alphabetize(word);
                List l = m.get(alpha);
                if (l == null)
                    m.put(alpha, l=new ArrayList());
                l.add(word);
            }
        } catch (IOException e) {
            System.err.println(e);
            System.exit(1);
        }

        // Print all permutation groups above size threshold
        for (List l : m.values())
            if (l.size() >= minGroupSize)
                System.out.println(l.size() + ": " + l);
    }

    private static String alphabetize(String s) {
        char[] a = s.toCharArray();
        Arrays.sort(a);
        return new String(a);
    }
}

Running this program on a 173,000-word dictionary file with a minimum anagram group size of eight produces the following output.

9: [estrin, inerts, insert, inters, niters, nitres, sinter,

  triens, trines]

8: [lapse, leaps, pales, peals, pleas, salep, sepal, spale]

8: [aspers, parses, passer, prases, repass, spares, sparse,

  spears]

10: [least, setal, slate, stale, steal, stela, taels, tales,

  teals, tesla]

8: [enters, nester, renest, rentes, resent, tenser, ternes,

  treens]

8: [arles, earls, lares, laser, lears, rales, reals, seral]

8: [earings, erasing, gainers, reagins, regains, reginas,

  searing, seringa]

8: [peris, piers, pries, prise, ripes, speir, spier, spire]

12: [apers, apres, asper, pares, parse, pears, prase, presa,

  rapes, reaps, spare, spear]

11: [alerts, alters, artels, estral, laster, ratels, salter,

  slater, staler, stelar, talers]

9: [capers, crapes, escarp, pacers, parsec, recaps, scrape,

  secpar, spacer]

9: [palest, palets, pastel, petals, plates, pleats, septal,

  staple, tepals]

9: [anestri, antsier, nastier, ratines, retains, retinas,

  retsina, stainer, stearin]

8: [ates, east, eats, etas, sate, seat, seta, teas]

8: [carets, cartes, caster, caters, crates, reacts, recast,

  traces]

Many of these words seem a bit bogus, but that's not the program's fault; they're in the dictionary file. Here's the dictionary file we used. It was derived from the Public Domain ENABLE benchmark reference word list.

Have a NICE day !

Ksharp

12 REPLIES 12
Haikuo
Onyx | Level 15

Hi Ksharp,

I am happy to announce that it is not that hard for SAS to tackle this one either. BTW, the dictionary file only has 80,368 words instead.

data have;

infile 'c:\temp\dictionary.txt' truncover;

input var :$100.;

run;

data want;

array st(8) $1. _temporary_;

length new_var $8;

set have;

do _n_=1 to lengthn(var);

st(_n_)=substr(compress(var),_n_,1);

end;

call sortc(of st(*));

new_var=cats(of st(*));

call missing(of st(*));

run;

proc sql;

select * from want group by new_var having count(var)>=9;

quit;

Regards,

Haikuo

Ksharp
Super User

Bian Hai Kuo,

I am happy to announce that it is not that hard for SAS to tackle this one either.

Yeah, you got it.Acutally it is more easy for SAS than JAVA .That is the advantage of SAS at the field of DATA Processing .

Maybe you can code it like below  to make it more less typing .

data have;

infile 'c:\temp\dictionary.txt' truncover;

input ( var1 - var50) ( $1.) ;

........

call sortc(of var:).

..

run;

Ksharp

PGStats
Opal | Level 21

Nice exercise! And SAS is pretty good at statistics too, compared to Java Smiley Happy.

I hope you will eventually find a good job doing SAS. Otherwise, that would be a waste of a great talent.

PG

PG
Ksharp
Super User

Thank you. PG,

I wish SAS can develop a language such as Java (OOP language) to allow us to create some GUI like JavaEE or JavaFX . That would be perfect for SAS.

I am afraid I will let you disappoint . Nowadays, I am learning JavaEE, almost stop learning SAS, But I don't want quit, so it is the reason why I am still at this forum.

In the future, I hope I can master JavaEE and find a job about Java Programmer and make a living by Java.

Ksharp

FriedEgg
SAS Employee

Nice posting Ksharp.  Reminds me of the word puzzles I had posted a long time ago.

My concept is similar to Hai.kuo's but I solve in one data step.  This way is pretty memory intense since I also retain the string of words, which slows it down.  Removing that piece should increase processing speed.

filename in url 'http://docs.oracle.com/javase/tutorial/collections/interfaces/examples/dictionary.txt';

data _null_;

array letter[8] $ 1 _temporary_;

length word new_word $ 8 words $ 76 count 3;

call missing(words,count);

retain addrlong dim;

addrlong=addrlong(letter[1]);

dim=dim(letter);

declare hash ha();

ha.definekey('new_word');

ha.definedata('new_word','words','count');

ha.definedone();

do until(eof);

call missing(of letter

  • );
  • infile in end=eof truncover;

    input word $;

    call pokelong(strip(word),addrlong,dim);

    call sortc(of letter

  • );
  • new_word=strip(peekclong(addrlong,dim));

    rc=ha.check();

    if rc then ha.add(key:new_word,data:new_word,data:word,data:1);

    else do;

    ha.find();  

    ha.replace(key:new_word,data:new_word,data:catx(' ',words,word),data:count+1);

    end;

    end;

    ha.output(dataset:'blah(where=(count>7))');

    stop;

    run;

    proc print data=blah;

    var count words;

    run;

    Mike_Davis
    Fluorite | Level 6

    I believe SAS web app should have more awesome  functionality combine with java language in the near future.

    Thanks for the interesting topic.

    Ksharp
    Super User

    SAS web app is a IDE tool  for Java from SAS ? If I remember correctly.

    FriedEgg
    SAS Employee

    I think you are thinking of the SAS App Dev Studio application, it is basically a wrapper on top of the Eclipse IDE that has some add-on's specific to aiding in developing applications using the SAS Integration API's.

    Ksharp
    Super User

    Matt,

    Yeah, I remember that puzzle you posted before. And SQL is also a good way.

    I can't believe you find the origin URL of this question. Do you switch to learn Java now ?

    'http://docs.oracle.com/javase/tutorial/collections/interfaces/examples/dictionary.txt';


    Ksharp

    FriedEgg
    SAS Employee

    KSharp, I have known JAVA a long time, longer than I have used SAS.  I am much more comfortable working in SAS these days, however.  Occasionally I have to mess with the J2EE applications that work with SAS Integration Technology APIs in JAVA.  The same could be done with JavaFX as well.  Through data-step java objects you could accomplish the inverse as well.

    Ksharp
    Super User

    "Through data-step java objects you could accomplish the inverse as well."

    Unfortunately, data-step java interface is only supported by several primitive type object ,such as int string double float ..........., he can't return all of immense object from JAVA like DATE ,SWING .............

    Ksharp

    SAS Innovate 2025: Save the Date

     SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

    Save the date!

    How to Concatenate Values

    Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

    Find more tutorials on the SAS Users YouTube channel.

    SAS Training: Just a Click Away

     Ready to level-up your skills? Choose your own adventure.

    Browse our catalog!

    Discussion stats
    • 12 replies
    • 3109 views
    • 6 likes
    • 5 in conversation