BookmarkSubscribeRSS Feed
R_Win
Calcite | Level 5

obs a1 a2 a3 a4 a5 a6  R_id
1 1 a 5 7 8 9 1
2 . . . . . . 2
3 11 b 12 13 14 15 3
4 9 c 100 101 102 103 4
5 104 c 106 107 108 109 4


obs=1 all are unique so r_id=1
obs=2 all are unique so r_id=2
obs=3 all are unique so r_id=2
obs=4 as a5 of obs1 and obs 4 of a1 having link
obs=5 as a2 of obs4 and obs 5 having linkage so r_id=4

i want like this

13 REPLIES 13
art297
Opal | Level 21

Your question is very similar to one asked a few months ago.  Take a look at: http://communities.sas.com/message/105224#105224

Ksharp
Super User

Your post is ambiguous.

What logic you want to apply to variable R_id.

If obs 4 has not linkage with obs 2, what value of R_id will be set for obs 4?

R_Win
Calcite | Level 5

obs a1 a2 a3 a4 a5 a6  R_id
1 1 a 5 7 8 9 1
2 . . . . . . 2
3 11 b 12 13 14 15 3
4 9 c 100 101 102 103 1
5 104 c 106 107 108 109 1


obs=1 all are unique so r_id=1
obs=2 all are unique so r_id=2
obs=3 all are unique so r_id=3
obs=4 as a5 of obs1 and obs 4 of a1 having link r_id=1
obs=5 as a2 of obs4 and obs 5 having linkage so r_id=1

sory i have corrected it

R_Win
Calcite | Level 5

obs a1 a2 a3 a4 a5 a6  R_id
1 1 a 5 7 8 9 1
2 . . . . . . 2
3 11 b 12 13 14 15 3
4 9 c 100 101 102 103 1
5 104 c 106 107 108 109 1


obs=1 all are unique so r_id=1
obs=2 all are unique so r_id=2
obs=3 all are unique so r_id=3
obs=4 as a5 of obs1 and obs 4 of a1 having link r_id=1
obs=5 as a2 of obs4 and obs 5 having linkage so r_id=1

art297
Opal | Level 21

Did you look at the post I suggested?  It sure seems like the same problem.  Try the following:

data have;

  input (a1 a2 a3 a4 a5 a6) ($);

  cards;

1 a 5 7 8 9

. . . . . .

11 b 12 13 14 15

9 c 100 101 102 103

104 c 106 107 108 109

;

data want (keep=a1-a6 r_id);

  if _n_ eq 1 then do;

    declare hash ha(hashexp: 16);

    ha.definekey('key');

    ha.definedata('hhold');

    ha.definedone();

  end;

  set have;

  array _house{*} $ a1-a6;

  do i=1 to dim(_house);

    key=_house{i};

    call missing(hhold);

    rc=ha.find();

    if rc=0 then do;

      found=1;

      r_id=hhold;

      leave;

    end;

  end;

  if not found then do;

    n+1;

    r_id=n;

  end;

  do j=1 to dim(_house);

    if not missing(_house{j}) then do;

      key=_house{j};

      hhold=r_id;

      ha.replace();

    end;

  end;

run;

Ksharp
Super User

Yes. I have already written it for sas_Form.

If your obs in a cluster are randomly stored in your dataset.

data test;
input  ( pan1 pan2 pan3 add1  ) ($);
datalines;
aaa   bbb    ccc     ddd    
. . . .
qqq   rrr       www   aaa   
rrr     ppp    mmm lll       
uuu   zzz    ffff      ppp     
p       l        m          n
. . . .
. . . .
jjjj    eee     rrr       ooo    
sss   www  .    .             
 .         .        .        eee 
;
run;

 

options compress=yes;
data want(keep=pan1 pan2 pan3 add1 household);
/*to make speed faster*/
 declare hash ha(hashexp : 20,ordered : 'a');
 declare hiter hi('ha');
  ha.definekey('count');
  ha.definedata('count','pan1','pan2','pan3','add1');
  ha.definedone();
 declare hash _ha(hashexp: 20,ordered : 'a');
  _ha.definekey('key');
  _ha.definedone();

do until(last);
 set test end=last; 
/*Remove obs which variable's are all missing firstly*/
 if cmiss(pan1,pan2,pan3,add1) lt 4 then do;
                                           count+1;
                                           ha.add();
                                         end;
end;

length key $ 40;
array h{4} $ 40 pan1 pan2 pan3 add1;
/*copy the first obs from Hash Table HA into PDV*/
_rc=hi.first();
do while(_rc eq 0); *until the end of Hash Table HA;
/*assign a unique cluster flag(i.e. household)*/
 household+1;
 do i=1 to 4;
/*push not missing value of current obs into another Hash Table _HA*/
   if not missing(h{i}) then do; key=h{i}; _ha.replace();end;
 end;   
/*start to run over Hash Table HA ,until can not find any more 
 observation which is the same cluster with current observation*/
 do until(x=1);
    x=1;
/*copy the first obs from Hash Table HA into PDV*/
    rc=hi.first();
    do while(rc=0);
      found=0;
      do j=1 to 4;
/*find whether any one of value is included in the current obs*/
      key=h{j};rcc=_ha.check();
      if rcc =0 then found=1;
      end;
      if found then do;
/*if any one of value is included,then push the obs which is copied from
Hash Table HA into Hash Tables _HA,flag it the same cluster with the 
current obs and output it into dataset*/
                      do k=1 to 4;
                      if not missing(h{k}) then do; key=h{k};_ha.replace();end;
                      end;
                      output;x=0; _count=count;*keep this found obs's index; 
                    end;
      rc=hi.next();
/*remove the found obs from Hash Table HA,since it has been seared*/
      if found then rx=ha.remove(key : _count);
     end;
 end;
/*clear up all the index which is the same cluster with the current obs*/ 
 _ha.clear();
/*copy the first obs from Hash Table HA into PDV*/
_rc=hi.first();
end; 
run;


Ksharp

R_Win
Calcite | Level 5

Thqs Ksharpthe code was working but it was taking time as i am having 8 crors of data can u help me to reduce the time

art297
Opal | Level 21

8 crors as in 80 million?  Easiest, quickest, most reliable and cheapest solution would be to buy a faster computer.

R_Win
Calcite | Level 5

I have tryed ur code but the linkages are not getting correctly and i am running this on SERVER ..

Ksharp
Super User

Sorry. I cann't do it anymore. If you delete ( ordered : 'a' ) in the hash table , maybe speed would be faster a little.

Are you sure that obs of a cluster is randomly stored in dataset?

If you only consider the previous obs for current obs ,then Art's code is very fast.

R_Win
Calcite | Level 5

I have tryed art code but the linkages are not getting correctly and i am running this on SERVER ..

art297
Opal | Level 21

FWIW, I posted the following on SAS-L ( http://www.listserv.uga.edu/cgi-bin/wa?A2=ind1109B&L=sas-l&D=1&H=0&O=D&T=1&P=10727 ) at the same time this discussion was going on in the forums last September.  Most of the ideas are KSharps, although a number of others offered suggestions I I really can't recall who should be credited with what.  Hopefully, you can use one of the methods to address your current problem:

data have;

  infile cards truncover;

  input (pan1 add1 pan2 pan3) (: $40.);

  cards;

aaa   bbb    ccc     ddd

qqq   rrr       www   aaa

rrr     ppp    mmm  lll

uuu   zzz    ffff      ppp

p       l        m          n

m      aaa

a  b  c  .

jjjj    eee     rrr       ooo

. . .   .

e  f  .  .

a  e  .  .

g  h  .  .

i  j  k  .

i  j  k  a

sss   www  .  .

.  .  . eee

;

run;

My macro/hash solution was:

%macro findthem;

  proc datasets library=work nowarn;

    delete recodes;

  quit;

 
  %let stopthis=0;


  /*Assign initial household numbers*/

  data for_recodes (keep=start hhold);

    if _n_ eq 1 then do;

      declare hash ha(hashexp: 16);

      ha.definekey('key');

      ha.definedata('hhold');

      ha.definedone();

    end;

    set have;

    array _house{*} $ 40 pan1--pan3;

    do i=1 to dim(_house);

      key=_house{i};

      call missing(hhold);

      rc=ha.find();

      if rc=0 then do;

        found=1;

        leave;

      end;

    end;

    if not found then do;

      n+1;

      hhold=n;

    end;

    do j=1 to dim(_house);

      if not missing(_house{j}) then do;

        key=_house{j};

        ha.replace();

        start=key;

        output;

      end;

    end;

  run;

 
  %let recodes=;

  %do %while (&stopthis eq 0);

    /*Test for invalid assignments*/

    proc sort data=for_recodes out=test nodupkey;

      by start hhold;

    run;

    data test2 (drop=n);

      set test end=eof;

      by start;

      if not(first.start and last.start) then do;

        output;

        n+1;

      end;

      if eof and n lt 1 then do;

        call symput('stopthis',1);

      end;

    run;

 
    %if &stopthis eq 0 %then %do;

      proc append base=recodes data=test2 force;

      run;

       proc sql noprint;

        select "if hhold eq "||

           strip(put(hhold,best12.))||

           " then hhold="||

           strip(put(min(hhold),best12.))||

           ";"

            into :recodes

              separated by " "

                from recodes

                        group by start

        ;

      quit;

 
       /*Correct identified discrepancies*/

      data for_recodes (keep=start hhold);

        if _n_ eq 1 then do;

          declare hash ha(hashexp: 16);

          ha.definekey('key');

          ha.definedata('hhold');

          ha.definedone();

        end;

        set have;

        array _house{*} $ 40 pan1--pan3;

        do i=1 to dim(_house);

          key=_house{i};

          call missing(hhold);

          rc=ha.find();

          if rc=0 then do;

            found=1;

            &recodes.

            leave;

          end;

        end;

        if not found then do;

          n+1;

          hhold=n;

          &recodes.

        end;

        do j=1 to dim(_house);

          if not missing(_house{j}) then do;

            key=_house{j};

            ha.replace();

            start=key;

            output;

          end;

        end;

      run;

    %end;

  %end;


  /*Get Final Dataset*/

  data want (keep=pan1 pan2 pan3 add1 household);

    if _n_ eq 1 then do;

      declare hash ha(hashexp: 16);

      ha.definekey('key');

      ha.definedata('hhold');

      ha.definedone();

    end;

    set have;

    array _house{*} $ 40 pan1--pan3;

    do i=1 to dim(_house);

      key=_house{i};

      call missing(hhold);

      rc=ha.find();

      if rc=0 then do;

        found=1;

        &recodes.

        household=hhold;

        leave;

      end;

    end;

    if not found then do;

      n+1;

      hhold=n;

      &recodes.

      household=hhold;

    end;

    do j=1 to dim(_house);

      if not missing(_house{j}) then do;

        key=_house{j};

        ha.replace();

      end;

    end;

  run;

%mend findthem;

 
%findthem

Ksharp's pure hash solution was:

 
data want(keep=pan1 pan2 pan3 add1 household);
  if _n_ =1 then do;
    declare hash ha(hashexp : 20,ordered: 'Y');
    declare hiter hi('ha');
    ha.definekey('count');
    ha.definedata('count','_pan1','_pan2','_pan3','_add1','searched');
    ha.definedone();
    do until(last);
     set have(rename=(pan1=_pan1 pan2=_pan2 pan3=_pan3 add1=_add1))
              end=last;
     count+1;
     searched=0;
     if not missing(_pan1) or not missing(_pan2) or
      not missing(_pan3) or not missing(_add1) then ha.add();
    end;


    declare hash _ha(hashexp: 20);
    _ ha.definekey('key');
    _ha.definedata('_household');
    _ha.definedone();
  end;
  set have ;
  _count+1;
  length key $ 40;
  array h{4} $ 40 pan1 pan2 pan3 add1;
  array _h{4} $ 40 _pan1 _pan2 _pan3 _add1;
  count=_count;searched=0;
  r=ha.find();
  if not searched then do;
    n+1;
    household=n;
    _household=n;
    do i=1 to 4;
      if not missing(h{i}) then do;
        key=h{i};
        _ha.replace();
      end;
    end;
    do until(x=1);
      x=1;
      rc=hi.first();
      do while(rc=0);
        if not searched then do;
          _found=0;
          do j=1 to 4;
            key=_h(j);
            _rc=_ha.check();
            if _rc =0 then _found=1;
          end;
          if _found then do;
            do k=1 to 4;
              if not missing(_h{k}) then do;
                key=_h(k);
                _ha.replace();
              end;
            end;
            searched=1;
            ha.replace();
            x=0;
          end;
        end;
        rc=hi.next();
      end;
    end;
  end;
  else do;
    do z=1 to 4;
      key=h{z};
      rr=_ha.find();
      if rr=0 then household=_household;
    end;
  end;
run;

Ksharp
Super User

Hi. Art .Thanks to remind my code at SAS-L.

But that code is older version . I have recoded and promoted it again ,and posted here already.

That is the fastest method I can think, but for the sake of OP's large table and the table's data construction,

It is still slow to speed .

Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1062 views
  • 0 likes
  • 3 in conversation