BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
stonewaly
Calcite | Level 5

I have a relatively large dataset generated using Rick Wicklin's awesome christmas gift exchange program.

Data structure looks like this (real dataset contains 10 names):

year Name1 Name2 Name3 Name4


2011 Abe   Sue   Bob   Joy

2012 Abe   Bob   Sue   Joy

2013 Sue   Joy   Bob   Abe

2014 Bob   Abe   Joy   Sue

2015 Abe   Sue   Bob   Joy

Question is:  How can I iteratively pass through the dataset, deleting records until no subsequent year assigns anyone to give a gift to the same person they were assigned to buy for the previous year?  In the above example, this would mean deleting the 2nd observation in the first pass, and then in the next pass deleting the original 3rd (now 2nd) observation, resulting in a final dataset looking like this (it's no problem if obs 1 & obs 3 are duplicates):

year Name1 Name2 Name3 Name4


2011 Abe   Sue   Bob   Joy

2014 Bob   Abe   Joy   Sue

2015 Abe   Sue   Bob   Joy


I can do the first pass with the following code, but I'm having trouble figuring out the code to iterate until no more consecutive dupes exist across the 10 name variables as (until some final point) new ones will be created with each pass.

data exchange2;

set exchange;

array names (10) Name1-Name4;

     do i = 1 to 4;

          if names{i} = lag(names{i}) then delete;

     end;

drop i;

run;

I realize such a procedure will likely result in dropping the vast majority of the data, which is fine since right now there are ~67,000 rows/years and none of us plan to live that long! Smiley Happy  Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You probably need a macro to loop over the dataset until no more records are removed.

Something like this should work.  But it does not look like it yields a good solution as obs 3 is just the same as obs 1 the final dataset.

data have ;

input year (name1-name4) ($);

cards;

2011 Abe   Sue   Bob   Joy

2012 Abe   Bob   Sue   Joy

2013 Sue   Joy   Bob   Abe

2014 Bob   Abe   Joy   Sue

2015 Abe   Sue   Bob   Joy

run;

%macro winnow(in=,out=);

%local ndel;

data &out; set ∈ run;

%do %until(&ndel=0);

  data &out ;

    if eof then call symputx('ndel',ndel);

    set &out end=eof;

    array name (4) ;

    do i=1 to dim(name) ;

      if name(i) = lag(name(i)) then del=1;

    end;

    if del then do;

      ndel+1;

      delete;

    end;

    drop i del ndel;

  run;

%end;

%mend winnow;

%winnow(in=have,out=want);

proc print data=want width=min; run;

Obs    year    name1    name2    name3    name4

1     2011     Abe      Sue      Bob      Joy

2     2014     Bob      Abe      Joy      Sue

3     2015     Abe      Sue      Bob      Joy

View solution in original post

7 REPLIES 7
Tom
Super User Tom
Super User

You probably need a macro to loop over the dataset until no more records are removed.

Something like this should work.  But it does not look like it yields a good solution as obs 3 is just the same as obs 1 the final dataset.

data have ;

input year (name1-name4) ($);

cards;

2011 Abe   Sue   Bob   Joy

2012 Abe   Bob   Sue   Joy

2013 Sue   Joy   Bob   Abe

2014 Bob   Abe   Joy   Sue

2015 Abe   Sue   Bob   Joy

run;

%macro winnow(in=,out=);

%local ndel;

data &out; set ∈ run;

%do %until(&ndel=0);

  data &out ;

    if eof then call symputx('ndel',ndel);

    set &out end=eof;

    array name (4) ;

    do i=1 to dim(name) ;

      if name(i) = lag(name(i)) then del=1;

    end;

    if del then do;

      ndel+1;

      delete;

    end;

    drop i del ndel;

  run;

%end;

%mend winnow;

%winnow(in=have,out=want);

proc print data=want width=min; run;

Obs    year    name1    name2    name3    name4

1     2011     Abe      Sue      Bob      Joy

2     2014     Bob      Abe      Joy      Sue

3     2015     Abe      Sue      Bob      Joy

stonewaly
Calcite | Level 5

This did the trick!  No need to worry about the 3rd line being the same as the first.  That's never the case in the real data, and all that was needed was to make sure person X didn't ever get assigned to person Y in two consecutive years.  Thanks.

Linlin
Lapis Lazuli | Level 10

Hi Stonewaly,

your code deleted muitiple records in one pass. the outcome is the same as from Tom's macro.

data have ;

input year (name1-name4) ($);

cards;

2011 Abe   Sue   Bob   Joy

2012 Abe   Bob   Sue   Joy

2013 Sue   Joy   Bob   Abe

2014 Bob   Abe   Joy   Sue

2015 Abe   Sue   Bob   Joy

run;

data exchange2;

set have;

array names (4) Name1-Name4;

     do i = 1 to 4;

          if names{i} = lag(names{i}) then delete;

     end;

drop i;

run;

proc print;run;

Obs    year    name1    name2    name3    name4

1     2011     Abe      Sue      Bob      Joy

2     2014     Bob      Abe      Joy      Sue

3     2015     Abe      Sue      Bob      Joy

stonewaly
Calcite | Level 5

Thanks Linlin.  That's true for the small test dataset posted with my question, but the actual dataset contained ~67,000 records and required several passes to completely remove repetitive assignments.

art297
Opal | Level 21

If I've correctly kept up with the thread, you are not removing repetitive assignments.  I would think that order is irrelevant and that ALL of the assignments are duplicates.  Am I missing something?

stonewaly
Calcite | Level 5

Art, my using the phrase "repetitive assignments" was probably a bit careless.  It's expected and OK to have person X assigned to person Y repeatedly over time, just not in consecutive years.  Year-to-year assignment order only matters here to the extent that such consecutive repeats are excluded.  Because the real data is for 10 people and includes 67K records (effectively infinite permutations given the typical human lifespan), culling the vast majority of it is fine and seemed like an effective way to go when I was originally considering the problem.  Just needed help writing a macro to incorporate iteration and the appropriate stopping condition.  Tom's worked for these purposes quite well.

Ksharp
Super User

If you care about speed ,then use Hash Table. I bet you will like it more.

data have ;
 input year (name1-name4) ($);
cards;
2011 Abe   Sue   Bob   Joy
2012 Abe   Bob   Sue   Joy
2013 Sue   Joy   Bob   Abe
2014 Bob   Abe   Joy   Sue
2015 Abe   Sue   Bob   Joy
;
run;
data _null_;
 if 0 then set have;
 declare hash ha(hashexp:20,dataset:'have',ordered:'A');
 declare hiter hi('ha');
  ha.definekey('year');
  ha.definedata('year','name1','name2','name3','name4');
  ha.definedone();
do until(done); 
 done=1; count=0; n=ha.num_items;
 rc=hi.first(); 
 _name1=name1;_name2=name2;_name3=name3;_name4=name4;
 do while(rc=0);
  count+1;
  rc=hi.next();  
  if _name1 eq name1 or _name2 eq name2 or _name3 eq name3 or _name4 eq name4 then do;done=0;k=year;hi.next();leave;end;
  _name1=name1;_name2=name2;_name3=name3;_name4=name4;
  if n=count+1 then leave;
 end;
 if done eq 0 then do; rx=ha.remove(key:k); end;
end;
 ha.output(dataset:'want');
run;



Ksharp

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 3258 views
  • 0 likes
  • 5 in conversation