Hi all, I'm seeking help in preparing a data set which involves multiple criteria to sieve out my desired results. Any pointers would be much appreciated, thank you.
Conditions: If there is duplicate ID, then check if there is duplicate record in Year. If there is duplicate in Year, keep only Hardcopy, else keep remaining record. (Keep record if there is no duplicates in any field).
data have;
input ID Year Origin$;
cards;
1 2015 Hardcopy
1 2015 Softcopy
1 2016 Hardcopy
1 2017 Softcopy
2 2015 Softcopy
2 2017 Softcopy
3 2016 Hardcopy;
data have;
input ID Year Origin$;
if Origin='Hardcopy' then n=1;else n=2;
cards;
1 2015 Hardcopy
1 2015 Softcopy
1 2016 Hardcopy
1 2017 Softcopy
2 2015 Softcopy
2 2017 Softcopy
3 2016 Hardcopy
;
run;
proc sort data=have out=temp;
by id year n;
run;
proc sort data=temp out=want nodupkey;
by id year;
run;
Something like:
data have; input ID Year Origin$; cards; 1 2015 Hardcopy 1 2015 Softcopy 1 2016 Hardcopy 1 2017 Softcopy 2 2015 Softcopy 2 2017 Softcopy 3 2016 Hardcopy ; run; proc sort data=have; by id year descending origin; run; data want; set have; by id year; if first.year and last.year then output; else if origin="Hardcopy" and not first.year then output; run;
data have;
input ID Year Origin$;
if Origin='Hardcopy' then n=1;else n=2;
cards;
1 2015 Hardcopy
1 2015 Softcopy
1 2016 Hardcopy
1 2017 Softcopy
2 2015 Softcopy
2 2017 Softcopy
3 2016 Hardcopy
;
run;
proc sort data=have out=temp;
by id year n;
run;
proc sort data=temp out=want nodupkey;
by id year;
run;
Thanks Ksharp, for the prompt and concise guide.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.