DATA Step, Macro, Functions and more

Loop through array re: lagged values until...

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

Loop through array re: lagged values until...

I have a relatively large dataset generated using Rick Wicklin's awesome christmas gift exchange program.

Data structure looks like this (real dataset contains 10 names):

year Name1 Name2 Name3 Name4


2011 Abe   Sue   Bob   Joy

2012 Abe   Bob   Sue   Joy

2013 Sue   Joy   Bob   Abe

2014 Bob   Abe   Joy   Sue

2015 Abe   Sue   Bob   Joy

Question is:  How can I iteratively pass through the dataset, deleting records until no subsequent year assigns anyone to give a gift to the same person they were assigned to buy for the previous year?  In the above example, this would mean deleting the 2nd observation in the first pass, and then in the next pass deleting the original 3rd (now 2nd) observation, resulting in a final dataset looking like this (it's no problem if obs 1 & obs 3 are duplicates):

year Name1 Name2 Name3 Name4


2011 Abe   Sue   Bob   Joy

2014 Bob   Abe   Joy   Sue

2015 Abe   Sue   Bob   Joy


I can do the first pass with the following code, but I'm having trouble figuring out the code to iterate until no more consecutive dupes exist across the 10 name variables as (until some final point) new ones will be created with each pass.

data exchange2;

set exchange;

array names (10) Name1-Name4;

     do i = 1 to 4;

          if names{i} = lag(names{i}) then delete;

     end;

drop i;

run;

I realize such a procedure will likely result in dropping the vast majority of the data, which is fine since right now there are ~67,000 rows/years and none of us plan to live that long! Smiley Happy  Thanks!


Accepted Solutions
Solution
‎11-22-2011 11:52 AM
Super User
Super User
Posts: 7,042

Re: Loop through array re: lagged values until...

Posted in reply to stonewaly

You probably need a macro to loop over the dataset until no more records are removed.

Something like this should work.  But it does not look like it yields a good solution as obs 3 is just the same as obs 1 the final dataset.

data have ;

input year (name1-name4) ($);

cards;

2011 Abe   Sue   Bob   Joy

2012 Abe   Bob   Sue   Joy

2013 Sue   Joy   Bob   Abe

2014 Bob   Abe   Joy   Sue

2015 Abe   Sue   Bob   Joy

run;

%macro winnow(in=,out=);

%local ndel;

data &out; set ∈ run;

%do %until(&ndel=0);

  data &out ;

    if eof then call symputx('ndel',ndel);

    set &out end=eof;

    array name (4) ;

    do i=1 to dim(name) ;

      if name(i) = lag(name(i)) then del=1;

    end;

    if del then do;

      ndel+1;

      delete;

    end;

    drop i del ndel;

  run;

%end;

%mend winnow;

%winnow(in=have,out=want);

proc print data=want width=min; run;

Obs    year    name1    name2    name3    name4

1     2011     Abe      Sue      Bob      Joy

2     2014     Bob      Abe      Joy      Sue

3     2015     Abe      Sue      Bob      Joy

View solution in original post


All Replies
Solution
‎11-22-2011 11:52 AM
Super User
Super User
Posts: 7,042

Re: Loop through array re: lagged values until...

Posted in reply to stonewaly

You probably need a macro to loop over the dataset until no more records are removed.

Something like this should work.  But it does not look like it yields a good solution as obs 3 is just the same as obs 1 the final dataset.

data have ;

input year (name1-name4) ($);

cards;

2011 Abe   Sue   Bob   Joy

2012 Abe   Bob   Sue   Joy

2013 Sue   Joy   Bob   Abe

2014 Bob   Abe   Joy   Sue

2015 Abe   Sue   Bob   Joy

run;

%macro winnow(in=,out=);

%local ndel;

data &out; set ∈ run;

%do %until(&ndel=0);

  data &out ;

    if eof then call symputx('ndel',ndel);

    set &out end=eof;

    array name (4) ;

    do i=1 to dim(name) ;

      if name(i) = lag(name(i)) then del=1;

    end;

    if del then do;

      ndel+1;

      delete;

    end;

    drop i del ndel;

  run;

%end;

%mend winnow;

%winnow(in=have,out=want);

proc print data=want width=min; run;

Obs    year    name1    name2    name3    name4

1     2011     Abe      Sue      Bob      Joy

2     2014     Bob      Abe      Joy      Sue

3     2015     Abe      Sue      Bob      Joy

Occasional Contributor
Posts: 9

Re: Loop through array re: lagged values until...

This did the trick!  No need to worry about the 3rd line being the same as the first.  That's never the case in the real data, and all that was needed was to make sure person X didn't ever get assigned to person Y in two consecutive years.  Thanks.

Super Contributor
Posts: 1,636

Re: Loop through array re: lagged values until...

Posted in reply to stonewaly

Hi Stonewaly,

your code deleted muitiple records in one pass. the outcome is the same as from Tom's macro.

data have ;

input year (name1-name4) ($);

cards;

2011 Abe   Sue   Bob   Joy

2012 Abe   Bob   Sue   Joy

2013 Sue   Joy   Bob   Abe

2014 Bob   Abe   Joy   Sue

2015 Abe   Sue   Bob   Joy

run;

data exchange2;

set have;

array names (4) Name1-Name4;

     do i = 1 to 4;

          if names{i} = lag(names{i}) then delete;

     end;

drop i;

run;

proc print;run;

Obs    year    name1    name2    name3    name4

1     2011     Abe      Sue      Bob      Joy

2     2014     Bob      Abe      Joy      Sue

3     2015     Abe      Sue      Bob      Joy

Occasional Contributor
Posts: 9

Re: Loop through array re: lagged values until...

Thanks Linlin.  That's true for the small test dataset posted with my question, but the actual dataset contained ~67,000 records and required several passes to completely remove repetitive assignments.

PROC Star
Posts: 7,471

Re: Loop through array re: lagged values until...

Posted in reply to stonewaly

If I've correctly kept up with the thread, you are not removing repetitive assignments.  I would think that order is irrelevant and that ALL of the assignments are duplicates.  Am I missing something?

Occasional Contributor
Posts: 9

Re: Loop through array re: lagged values until...

Art, my using the phrase "repetitive assignments" was probably a bit careless.  It's expected and OK to have person X assigned to person Y repeatedly over time, just not in consecutive years.  Year-to-year assignment order only matters here to the extent that such consecutive repeats are excluded.  Because the real data is for 10 people and includes 67K records (effectively infinite permutations given the typical human lifespan), culling the vast majority of it is fine and seemed like an effective way to go when I was originally considering the problem.  Just needed help writing a macro to incorporate iteration and the appropriate stopping condition.  Tom's worked for these purposes quite well.

Super User
Posts: 10,028

Re: Loop through array re: lagged values until...

Posted in reply to stonewaly

If you care about speed ,then use Hash Table. I bet you will like it more.

data have ;
 input year (name1-name4) ($);
cards;
2011 Abe   Sue   Bob   Joy
2012 Abe   Bob   Sue   Joy
2013 Sue   Joy   Bob   Abe
2014 Bob   Abe   Joy   Sue
2015 Abe   Sue   Bob   Joy
;
run;
data _null_;
 if 0 then set have;
 declare hash ha(hashexp:20,dataset:'have',ordered:'A');
 declare hiter hi('ha');
  ha.definekey('year');
  ha.definedata('year','name1','name2','name3','name4');
  ha.definedone();
do until(done); 
 done=1; count=0; n=ha.num_items;
 rc=hi.first(); 
 _name1=name1;_name2=name2;_name3=name3;_name4=name4;
 do while(rc=0);
  count+1;
  rc=hi.next();  
  if _name1 eq name1 or _name2 eq name2 or _name3 eq name3 or _name4 eq name4 then do;done=0;k=year;hi.next();leave;end;
  _name1=name1;_name2=name2;_name3=name3;_name4=name4;
  if n=count+1 then leave;
 end;
 if done eq 0 then do; rx=ha.remove(key:k); end;
end;
 ha.output(dataset:'want');
run;



Ksharp

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 1422 views
  • 0 likes
  • 5 in conversation