BookmarkSubscribeRSS Feed
R_Win
Calcite | Level 5

obs a1 a2 a3 a4 a5 a6  R_id
1 1 a 5 7 8 9 1
2 . . . . . . 2
3 11 b 12 13 14 15 3
4 9 c 100 101 102 103 4
5 104 c 106 107 108 109 4


obs=1 all are unique so r_id=1
obs=2 all are unique so r_id=2
obs=3 all are unique so r_id=2
obs=4 as a5 of obs1 and obs 4 of a1 having link
obs=5 as a2 of obs4 and obs 5 having linkage so r_id=4

i want like this

13 REPLIES 13
art297
Opal | Level 21

Your question is very similar to one asked a few months ago.  Take a look at: http://communities.sas.com/message/105224#105224

Ksharp
Super User

Your post is ambiguous.

What logic you want to apply to variable R_id.

If obs 4 has not linkage with obs 2, what value of R_id will be set for obs 4?

R_Win
Calcite | Level 5

obs a1 a2 a3 a4 a5 a6  R_id
1 1 a 5 7 8 9 1
2 . . . . . . 2
3 11 b 12 13 14 15 3
4 9 c 100 101 102 103 1
5 104 c 106 107 108 109 1


obs=1 all are unique so r_id=1
obs=2 all are unique so r_id=2
obs=3 all are unique so r_id=3
obs=4 as a5 of obs1 and obs 4 of a1 having link r_id=1
obs=5 as a2 of obs4 and obs 5 having linkage so r_id=1

sory i have corrected it

R_Win
Calcite | Level 5

obs a1 a2 a3 a4 a5 a6  R_id
1 1 a 5 7 8 9 1
2 . . . . . . 2
3 11 b 12 13 14 15 3
4 9 c 100 101 102 103 1
5 104 c 106 107 108 109 1


obs=1 all are unique so r_id=1
obs=2 all are unique so r_id=2
obs=3 all are unique so r_id=3
obs=4 as a5 of obs1 and obs 4 of a1 having link r_id=1
obs=5 as a2 of obs4 and obs 5 having linkage so r_id=1

art297
Opal | Level 21

Did you look at the post I suggested?  It sure seems like the same problem.  Try the following:

data have;

  input (a1 a2 a3 a4 a5 a6) ($);

  cards;

1 a 5 7 8 9

. . . . . .

11 b 12 13 14 15

9 c 100 101 102 103

104 c 106 107 108 109

;

data want (keep=a1-a6 r_id);

  if _n_ eq 1 then do;

    declare hash ha(hashexp: 16);

    ha.definekey('key');

    ha.definedata('hhold');

    ha.definedone();

  end;

  set have;

  array _house{*} $ a1-a6;

  do i=1 to dim(_house);

    key=_house{i};

    call missing(hhold);

    rc=ha.find();

    if rc=0 then do;

      found=1;

      r_id=hhold;

      leave;

    end;

  end;

  if not found then do;

    n+1;

    r_id=n;

  end;

  do j=1 to dim(_house);

    if not missing(_house{j}) then do;

      key=_house{j};

      hhold=r_id;

      ha.replace();

    end;

  end;

run;

Ksharp
Super User

Yes. I have already written it for sas_Form.

If your obs in a cluster are randomly stored in your dataset.

data test;
input  ( pan1 pan2 pan3 add1  ) ($);
datalines;
aaa   bbb    ccc     ddd    
. . . .
qqq   rrr       www   aaa   
rrr     ppp    mmm lll       
uuu   zzz    ffff      ppp     
p       l        m          n
. . . .
. . . .
jjjj    eee     rrr       ooo    
sss   www  .    .             
 .         .        .        eee 
;
run;

 

options compress=yes;
data want(keep=pan1 pan2 pan3 add1 household);
/*to make speed faster*/
 declare hash ha(hashexp : 20,ordered : 'a');
 declare hiter hi('ha');
  ha.definekey('count');
  ha.definedata('count','pan1','pan2','pan3','add1');
  ha.definedone();
 declare hash _ha(hashexp: 20,ordered : 'a');
  _ha.definekey('key');
  _ha.definedone();

do until(last);
 set test end=last; 
/*Remove obs which variable's are all missing firstly*/
 if cmiss(pan1,pan2,pan3,add1) lt 4 then do;
                                           count+1;
                                           ha.add();
                                         end;
end;

length key $ 40;
array h{4} $ 40 pan1 pan2 pan3 add1;
/*copy the first obs from Hash Table HA into PDV*/
_rc=hi.first();
do while(_rc eq 0); *until the end of Hash Table HA;
/*assign a unique cluster flag(i.e. household)*/
 household+1;
 do i=1 to 4;
/*push not missing value of current obs into another Hash Table _HA*/
   if not missing(h{i}) then do; key=h{i}; _ha.replace();end;
 end;   
/*start to run over Hash Table HA ,until can not find any more 
 observation which is the same cluster with current observation*/
 do until(x=1);
    x=1;
/*copy the first obs from Hash Table HA into PDV*/
    rc=hi.first();
    do while(rc=0);
      found=0;
      do j=1 to 4;
/*find whether any one of value is included in the current obs*/
      key=h{j};rcc=_ha.check();
      if rcc =0 then found=1;
      end;
      if found then do;
/*if any one of value is included,then push the obs which is copied from
Hash Table HA into Hash Tables _HA,flag it the same cluster with the 
current obs and output it into dataset*/
                      do k=1 to 4;
                      if not missing(h{k}) then do; key=h{k};_ha.replace();end;
                      end;
                      output;x=0; _count=count;*keep this found obs's index; 
                    end;
      rc=hi.next();
/*remove the found obs from Hash Table HA,since it has been seared*/
      if found then rx=ha.remove(key : _count);
     end;
 end;
/*clear up all the index which is the same cluster with the current obs*/ 
 _ha.clear();
/*copy the first obs from Hash Table HA into PDV*/
_rc=hi.first();
end; 
run;


Ksharp

R_Win
Calcite | Level 5

Thqs Ksharpthe code was working but it was taking time as i am having 8 crors of data can u help me to reduce the time

art297
Opal | Level 21

8 crors as in 80 million?  Easiest, quickest, most reliable and cheapest solution would be to buy a faster computer.

R_Win
Calcite | Level 5

I have tryed ur code but the linkages are not getting correctly and i am running this on SERVER ..

Ksharp
Super User

Sorry. I cann't do it anymore. If you delete ( ordered : 'a' ) in the hash table , maybe speed would be faster a little.

Are you sure that obs of a cluster is randomly stored in dataset?

If you only consider the previous obs for current obs ,then Art's code is very fast.

R_Win
Calcite | Level 5

I have tryed art code but the linkages are not getting correctly and i am running this on SERVER ..

art297
Opal | Level 21

FWIW, I posted the following on SAS-L ( http://www.listserv.uga.edu/cgi-bin/wa?A2=ind1109B&L=sas-l&D=1&H=0&O=D&T=1&P=10727 ) at the same time this discussion was going on in the forums last September.  Most of the ideas are KSharps, although a number of others offered suggestions I I really can't recall who should be credited with what.  Hopefully, you can use one of the methods to address your current problem:

data have;

  infile cards truncover;

  input (pan1 add1 pan2 pan3) (: $40.);

  cards;

aaa   bbb    ccc     ddd

qqq   rrr       www   aaa

rrr     ppp    mmm  lll

uuu   zzz    ffff      ppp

p       l        m          n

m      aaa

a  b  c  .

jjjj    eee     rrr       ooo

. . .   .

e  f  .  .

a  e  .  .

g  h  .  .

i  j  k  .

i  j  k  a

sss   www  .  .

.  .  . eee

;

run;

My macro/hash solution was:

%macro findthem;

  proc datasets library=work nowarn;

    delete recodes;

  quit;

 
  %let stopthis=0;


  /*Assign initial household numbers*/

  data for_recodes (keep=start hhold);

    if _n_ eq 1 then do;

      declare hash ha(hashexp: 16);

      ha.definekey('key');

      ha.definedata('hhold');

      ha.definedone();

    end;

    set have;

    array _house{*} $ 40 pan1--pan3;

    do i=1 to dim(_house);

      key=_house{i};

      call missing(hhold);

      rc=ha.find();

      if rc=0 then do;

        found=1;

        leave;

      end;

    end;

    if not found then do;

      n+1;

      hhold=n;

    end;

    do j=1 to dim(_house);

      if not missing(_house{j}) then do;

        key=_house{j};

        ha.replace();

        start=key;

        output;

      end;

    end;

  run;

 
  %let recodes=;

  %do %while (&stopthis eq 0);

    /*Test for invalid assignments*/

    proc sort data=for_recodes out=test nodupkey;

      by start hhold;

    run;

    data test2 (drop=n);

      set test end=eof;

      by start;

      if not(first.start and last.start) then do;

        output;

        n+1;

      end;

      if eof and n lt 1 then do;

        call symput('stopthis',1);

      end;

    run;

 
    %if &stopthis eq 0 %then %do;

      proc append base=recodes data=test2 force;

      run;

       proc sql noprint;

        select "if hhold eq "||

           strip(put(hhold,best12.))||

           " then hhold="||

           strip(put(min(hhold),best12.))||

           ";"

            into :recodes

              separated by " "

                from recodes

                        group by start

        ;

      quit;

 
       /*Correct identified discrepancies*/

      data for_recodes (keep=start hhold);

        if _n_ eq 1 then do;

          declare hash ha(hashexp: 16);

          ha.definekey('key');

          ha.definedata('hhold');

          ha.definedone();

        end;

        set have;

        array _house{*} $ 40 pan1--pan3;

        do i=1 to dim(_house);

          key=_house{i};

          call missing(hhold);

          rc=ha.find();

          if rc=0 then do;

            found=1;

            &recodes.

            leave;

          end;

        end;

        if not found then do;

          n+1;

          hhold=n;

          &recodes.

        end;

        do j=1 to dim(_house);

          if not missing(_house{j}) then do;

            key=_house{j};

            ha.replace();

            start=key;

            output;

          end;

        end;

      run;

    %end;

  %end;


  /*Get Final Dataset*/

  data want (keep=pan1 pan2 pan3 add1 household);

    if _n_ eq 1 then do;

      declare hash ha(hashexp: 16);

      ha.definekey('key');

      ha.definedata('hhold');

      ha.definedone();

    end;

    set have;

    array _house{*} $ 40 pan1--pan3;

    do i=1 to dim(_house);

      key=_house{i};

      call missing(hhold);

      rc=ha.find();

      if rc=0 then do;

        found=1;

        &recodes.

        household=hhold;

        leave;

      end;

    end;

    if not found then do;

      n+1;

      hhold=n;

      &recodes.

      household=hhold;

    end;

    do j=1 to dim(_house);

      if not missing(_house{j}) then do;

        key=_house{j};

        ha.replace();

      end;

    end;

  run;

%mend findthem;

 
%findthem

Ksharp's pure hash solution was:

 
data want(keep=pan1 pan2 pan3 add1 household);
  if _n_ =1 then do;
    declare hash ha(hashexp : 20,ordered: 'Y');
    declare hiter hi('ha');
    ha.definekey('count');
    ha.definedata('count','_pan1','_pan2','_pan3','_add1','searched');
    ha.definedone();
    do until(last);
     set have(rename=(pan1=_pan1 pan2=_pan2 pan3=_pan3 add1=_add1))
              end=last;
     count+1;
     searched=0;
     if not missing(_pan1) or not missing(_pan2) or
      not missing(_pan3) or not missing(_add1) then ha.add();
    end;


    declare hash _ha(hashexp: 20);
    _ ha.definekey('key');
    _ha.definedata('_household');
    _ha.definedone();
  end;
  set have ;
  _count+1;
  length key $ 40;
  array h{4} $ 40 pan1 pan2 pan3 add1;
  array _h{4} $ 40 _pan1 _pan2 _pan3 _add1;
  count=_count;searched=0;
  r=ha.find();
  if not searched then do;
    n+1;
    household=n;
    _household=n;
    do i=1 to 4;
      if not missing(h{i}) then do;
        key=h{i};
        _ha.replace();
      end;
    end;
    do until(x=1);
      x=1;
      rc=hi.first();
      do while(rc=0);
        if not searched then do;
          _found=0;
          do j=1 to 4;
            key=_h(j);
            _rc=_ha.check();
            if _rc =0 then _found=1;
          end;
          if _found then do;
            do k=1 to 4;
              if not missing(_h{k}) then do;
                key=_h(k);
                _ha.replace();
              end;
            end;
            searched=1;
            ha.replace();
            x=0;
          end;
        end;
        rc=hi.next();
      end;
    end;
  end;
  else do;
    do z=1 to 4;
      key=h{z};
      rr=_ha.find();
      if rr=0 then household=_household;
    end;
  end;
run;

Ksharp
Super User

Hi. Art .Thanks to remind my code at SAS-L.

But that code is older version . I have recoded and promoted it again ,and posted here already.

That is the fastest method I can think, but for the sake of OP's large table and the table's data construction,

It is still slow to speed .

Ksharp

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 1628 views
  • 0 likes
  • 3 in conversation