I have a relatively large dataset generated using Rick Wicklin's awesome christmas gift exchange program.
Data structure looks like this (real dataset contains 10 names):
year Name1 Name2 Name3 Name4
2011 Abe Sue Bob Joy
2012 Abe Bob Sue Joy
2013 Sue Joy Bob Abe
2014 Bob Abe Joy Sue
2015 Abe Sue Bob Joy
Question is: How can I iteratively pass through the dataset, deleting records until no subsequent year assigns anyone to give a gift to the same person they were assigned to buy for the previous year? In the above example, this would mean deleting the 2nd observation in the first pass, and then in the next pass deleting the original 3rd (now 2nd) observation, resulting in a final dataset looking like this (it's no problem if obs 1 & obs 3 are duplicates):
year Name1 Name2 Name3 Name4
2011 Abe Sue Bob Joy
2014 Bob Abe Joy Sue
2015 Abe Sue Bob Joy
I can do the first pass with the following code, but I'm having trouble figuring out the code to iterate until no more consecutive dupes exist across the 10 name variables as (until some final point) new ones will be created with each pass.
data exchange2;
set exchange;
array names (10) Name1-Name4;
do i = 1 to 4;
if names{i} = lag(names{i}) then delete;
end;
drop i;
run;
I realize such a procedure will likely result in dropping the vast majority of the data, which is fine since right now there are ~67,000 rows/years and none of us plan to live that long! Thanks!
You probably need a macro to loop over the dataset until no more records are removed.
Something like this should work. But it does not look like it yields a good solution as obs 3 is just the same as obs 1 the final dataset.
data have ;
input year (name1-name4) ($);
cards;
2011 Abe Sue Bob Joy
2012 Abe Bob Sue Joy
2013 Sue Joy Bob Abe
2014 Bob Abe Joy Sue
2015 Abe Sue Bob Joy
run;
%macro winnow(in=,out=);
%local ndel;
data &out; set ∈ run;
%do %until(&ndel=0);
data &out ;
if eof then call symputx('ndel',ndel);
set &out end=eof;
array name (4) ;
do i=1 to dim(name) ;
if name(i) = lag(name(i)) then del=1;
end;
if del then do;
ndel+1;
delete;
end;
drop i del ndel;
run;
%end;
%mend winnow;
%winnow(in=have,out=want);
proc print data=want width=min; run;
Obs year name1 name2 name3 name4
1 2011 Abe Sue Bob Joy
2 2014 Bob Abe Joy Sue
3 2015 Abe Sue Bob Joy
You probably need a macro to loop over the dataset until no more records are removed.
Something like this should work. But it does not look like it yields a good solution as obs 3 is just the same as obs 1 the final dataset.
data have ;
input year (name1-name4) ($);
cards;
2011 Abe Sue Bob Joy
2012 Abe Bob Sue Joy
2013 Sue Joy Bob Abe
2014 Bob Abe Joy Sue
2015 Abe Sue Bob Joy
run;
%macro winnow(in=,out=);
%local ndel;
data &out; set ∈ run;
%do %until(&ndel=0);
data &out ;
if eof then call symputx('ndel',ndel);
set &out end=eof;
array name (4) ;
do i=1 to dim(name) ;
if name(i) = lag(name(i)) then del=1;
end;
if del then do;
ndel+1;
delete;
end;
drop i del ndel;
run;
%end;
%mend winnow;
%winnow(in=have,out=want);
proc print data=want width=min; run;
Obs year name1 name2 name3 name4
1 2011 Abe Sue Bob Joy
2 2014 Bob Abe Joy Sue
3 2015 Abe Sue Bob Joy
This did the trick! No need to worry about the 3rd line being the same as the first. That's never the case in the real data, and all that was needed was to make sure person X didn't ever get assigned to person Y in two consecutive years. Thanks.
Hi Stonewaly,
your code deleted muitiple records in one pass. the outcome is the same as from Tom's macro.
data have ;
input year (name1-name4) ($);
cards;
2011 Abe Sue Bob Joy
2012 Abe Bob Sue Joy
2013 Sue Joy Bob Abe
2014 Bob Abe Joy Sue
2015 Abe Sue Bob Joy
run;
data exchange2;
set have;
array names (4) Name1-Name4;
do i = 1 to 4;
if names{i} = lag(names{i}) then delete;
end;
drop i;
run;
proc print;run;
Obs year name1 name2 name3 name4
1 2011 Abe Sue Bob Joy
2 2014 Bob Abe Joy Sue
3 2015 Abe Sue Bob Joy
Thanks Linlin. That's true for the small test dataset posted with my question, but the actual dataset contained ~67,000 records and required several passes to completely remove repetitive assignments.
If I've correctly kept up with the thread, you are not removing repetitive assignments. I would think that order is irrelevant and that ALL of the assignments are duplicates. Am I missing something?
Art, my using the phrase "repetitive assignments" was probably a bit careless. It's expected and OK to have person X assigned to person Y repeatedly over time, just not in consecutive years. Year-to-year assignment order only matters here to the extent that such consecutive repeats are excluded. Because the real data is for 10 people and includes 67K records (effectively infinite permutations given the typical human lifespan), culling the vast majority of it is fine and seemed like an effective way to go when I was originally considering the problem. Just needed help writing a macro to incorporate iteration and the appropriate stopping condition. Tom's worked for these purposes quite well.
If you care about speed ,then use Hash Table. I bet you will like it more.
data have ; input year (name1-name4) ($); cards; 2011 Abe Sue Bob Joy 2012 Abe Bob Sue Joy 2013 Sue Joy Bob Abe 2014 Bob Abe Joy Sue 2015 Abe Sue Bob Joy ; run; data _null_; if 0 then set have; declare hash ha(hashexp:20,dataset:'have',ordered:'A'); declare hiter hi('ha'); ha.definekey('year'); ha.definedata('year','name1','name2','name3','name4'); ha.definedone(); do until(done); done=1; count=0; n=ha.num_items; rc=hi.first(); _name1=name1;_name2=name2;_name3=name3;_name4=name4; do while(rc=0); count+1; rc=hi.next(); if _name1 eq name1 or _name2 eq name2 or _name3 eq name3 or _name4 eq name4 then do;done=0;k=year;hi.next();leave;end; _name1=name1;_name2=name2;_name3=name3;_name4=name4; if n=count+1 then leave; end; if done eq 0 then do; rx=ha.remove(key:k); end; end; ha.output(dataset:'want'); run;
Ksharp
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.