DATA Step, Macro, Functions and more

Pre-Labor Day Coding Challenge

Reply
PROC Star
Posts: 7,363

Pre-Labor Day Coding Challenge

In a recent thread ( http://communities.sas.com/message/104261#104261 ) the OP had records with four variables and, if I correctly understood the question, wanted to know if one of the entries had been used previously and if so assign the same household # that had been used before and, if not, to increment the total households by 1 and assign it.

Yes, I fully realize that there are a number of solutions to such a problem, but many require skills that I think are currently beyond most SAS users' understanding.

Thus, I have an alternative approach that I don't know is even doable, i.e., I wouldn't know how to do it in the way that I envision.  Proc Format would be an easy way to have a lookup table that most SAS users could relate to.  Can a format be created dynamically?  That is, starting with an empty format, check the first record and if no match was found in the format, increment a counter and assign the four entries to the format with the incremented counter.

Then, for the next record, test the values with the format, assign the household id if found and, if not, increment the counter and add the new entries to the format.

Thus the challenge: is that doable and, if so, provide a working example using the 100+ record dataset that the OP provided in the above mentioned thread.

Super User
Posts: 9,681

Pre-Labor Day Coding Challenge

Art.T

OP not only want one of the entries had been used previously ,but also been used afterwards.

I do not think proc format can do it, because it is static not dynamic. And You should exclude the current obs and

blank , tab character ..... such invisible character.

Because OP 's variables are all character ,It make more complex for comparing the character.

Ksharp

Frequent Contributor
Posts: 104

Re: Pre-Labor Day Coding Challenge

Have a good Labour Day weekend, Dr. Art, from a fellow Torontonian.

These connected subgraph problems happens in householding a lot.  There are commercial software that does this, but if one can't afford the cost, then reinventing the hammer is necessary.

I have a macro solution from a ways back that works, but not via PROC FORMAT.  Thought I'd just throw it here for fun.  As for the idea of dynamic lookup, HASH tables probably work better, but looks more daunting to the novice programmer.  Perhaps Ksharp can provide working code on that Smiley Wink

Attachment
Attachment
PROC Star
Posts: 7,363

Re: Pre-Labor Day Coding Challenge

I'm going to look at your code later this afternoon.  Ksharp did provide a response to the thread that originally got me thinking about this, involving a hash of course, but I found that the code didn't correctly address the OP's requirements.

I modified it myself and I think that the following works.  I definitely have to learn more about using a hash to solve problems like this.  I originally attempted to build a datastep solution, using call execute to invoke a macro that called proc format but, as expected, the calls to proc format had to wait until after the datastep which made them useless for the task.

My modified version of Ksharp's code:

data have;

  infile cards dsd dlm='|' truncover;

  input (pan1 add1 pan2 pan3) (: $40.);

  cards;

M5669K|4HH5BFWBBMBY48Y7P8S|P5P8YBBMBY48Y7P8S0005D|

I2175Q|P5P8YBBMBY48Y7P8S0005D||

S1495P|8BWP_Y8V3MYVWBBMBBMB||

G4909P|WN8Y4YFN48FYP884P83Y4||

|S1495P||

S0560L|488Y8N8N8H-K56H4YP8P||

F0994B|R8YPF8Y4P-0BWP8BBMBY||

H6408L|HICPM4BBPWFP4WV4MP8YBB||

;

data want (keep=pan1 pan2 pan3 add1 household);

  if _n_ eq 1 then do;

    declare hash ha(hashexp: 16);

    ha.definekey('key');

    ha.definedata('hhold');

    ha.definedone();

  end;

  set have;

  array _house{*} $ 40 pan1--pan3;

  do i=1 to dim(_house);

    key=_house{i};

    call missing(hhold);

    rc=ha.find();

    if rc=0 then do;

      found=1;

      household=hhold;

      leave;

    end;

  end;

  if not found then do;

    n+1;

    household=n;

  end;

  do j=1 to dim(_house);

    if not missing(_house{j}) then do;

      key=_house{j};

      hhold=household;

      ha.replace();

    end;

  end;

run;

Frequent Contributor
Posts: 104

Re: Pre-Labor Day Coding Challenge

My code is provided for "entertainment value".  It was written many moons ago, long before hash tables were available.  It just shows when needs arise, people find ways to solve it regardless of whether it is the right way, the elegant way, or not.  Need to get stuff done is a very powerful motivator.

PROC Star
Posts: 7,363

Re: Pre-Labor Day Coding Challenge

And I was indeed quite entertained!  I didn't post my own solution because it turned out to be far too sloppy given the availability of hash tables.  What I did was take two passes thru the data and, like you, in a macro.

The first pass was actually N passes, reading one record each time and appending the results to the format. Then a second pass just looking up the formatted values.  It worked, but not an approach I would recommend today.

I totally agree, though, motivation is an extremely powerful motivator and sometimes results in capabilities we didn't realize we could achieve.  Since you are going to be at the September 16th TASS (Toronto Area SAS Society) meeting, you'll get to see one of my examples of just such an effort.

Ask a Question
Discussion stats
  • 5 replies
  • 189 views
  • 0 likes
  • 3 in conversation