This is extension to my previous question.
I would like to output last. if the group value is same.
If the group value is different I want to retain both records.(keep both Bold records below)
data to_clean;
infile cards dlm='|' truncover ;
input subDate :mmddyy10. unitName :$100. ADDR1 :$100. group $4.;
format subDate yymmdd10.;
cards;
11/21/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN|TNF
10/30/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN STREET|TNF
10/30/2020|BIG HORN ENTERPRISE|146 S. BENT STREET|SNF
10/30/2020|BIG HORN ENTERPRISE|641 WARREN STREET|SNF
11/5/2020|BROOKDALE |tt|ALF
10/29/2020|BROOKDALE|2401 COUGAR AVENUE|ALF
10/30/2020|ELMCROFT|1551 SUGARLAND DRIVE|ALF
11/2/2020|ELMCROFT|1551 SUGARLAND DRIVE DRIVE|SNF
11/21/2020|GREEN HOUSE LIVING|2311 SHIRLEY|SNF
10/29/2020|GREEN HOUSE LIVING|2311 SHIRLEY COVE|ALF
11/21/2020|MISSION AT THE VILLA|1445 UINTA|ALF
11/2/2020|MISSION AT THE VILLA|1445 UINTA DRIVE|ALF
;
proc sort data=to_clean; by unitName subDate; run;
proc sql;
create table to_clean_new as
select * from to_clean
order by unitName, subDate, group;
quit;
data to_clean_2;
set to_clean_new;
by unitName;
length goodAddr $100;
retain goodAddr;
if first.unitName then goodAddr = addr1;
else if complev(trim(goodAddr), trim(addr1), "IL:") > 2 then goodAddr = addr1;
run;
proc sort data=to_clean_2; by unitName goodAddr subDate group; run;
data want;
set to_clean_2;
by unitName goodAddr;
if last.goodAddr;
run;
proc print noobs data=want; run;
****
Thank you
I've restated your objective to "keep the last observation (highest subdate) for each unitname/group combination). If that's correct, then this application of proc summary and a subsequent use of a hash object will do, which prunes the dataset while preserving the original order of the data (in case that's important). It requires only 2 passes through the data:
data to_clean;
infile cards dlm='|' truncover ;
input subDate :mmddyy10. unitName :$100. ADDR1 :$100. group $4.;
format subDate yymmdd10.;
cards;
11/21/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN|TNF
10/30/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN STREET|TNF
10/30/2020|BIG HORN ENTERPRISE|146 S. BENT STREET|SNF
10/30/2020|BIG HORN ENTERPRISE|641 WARREN STREET|SNF
11/5/2020|BROOKDALE |tt|ALF
10/29/2020|BROOKDALE|2401 COUGAR AVENUE|ALF
10/30/2020|ELMCROFT|1551 SUGARLAND DRIVE|ALF
11/2/2020|ELMCROFT|1551 SUGARLAND DRIVE DRIVE|SNF
11/21/2020|GREEN HOUSE LIVING|2311 SHIRLEY|SNF
10/29/2020|GREEN HOUSE LIVING|2311 SHIRLEY COVE|ALF
11/21/2020|MISSION AT THE VILLA|1445 UINTA|ALF
11/2/2020|MISSION AT THE VILLA|1445 UINTA DRIVE|ALF
;
proc summary data=to_clean nway;
class unitname group;
var subdate;
output out=need (drop=_:) max=subdate;
run;
data want;
set to_clean;
if _n_=1 then do;
declare hash h (dataset:'need');
h.definekey(all:'Y');
h.definedone();
end;
if h.check()=0;
run;
The proc summary (because of the NWAY option) outputs a dataset with one observation for each unitname/group combination. It records the maximum value of the analysis variable subdate for each combo, thereby providing the desired unitname/group/subdate values.
Stick that dataset in the hash object h, and check each incoming obs from to_clean against h, and keep only those whose key values (unitname,group,subdate) are found in h (i.e. "if h.check()=0").
Hi @Stalk I'm not sure of your expected output. However, rather than looking at your code, I'm taking a guess with your bold lines and your description-
"I would like to output last. if the group value is same.
If the group value is different I want to retain both records.(keep both Bold records below)"
data to_clean;
infile cards dlm='|' truncover ;
input subDate :mmddyy10. unitName :$100. ADDR1 :$100. group $4.;
format subDate yymmdd10.;
cards;
11/21/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN|TNF
10/30/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN STREET|TNF
10/30/2020|BIG HORN ENTERPRISE|146 S. BENT STREET|SNF
10/30/2020|BIG HORN ENTERPRISE|641 WARREN STREET|SNF
11/5/2020|BROOKDALE |tt|ALF
10/29/2020|BROOKDALE|2401 COUGAR AVENUE|ALF
10/30/2020|ELMCROFT|1551 SUGARLAND DRIVE|ALF
11/2/2020|ELMCROFT|1551 SUGARLAND DRIVE DRIVE|SNF
11/21/2020|GREEN HOUSE LIVING|2311 SHIRLEY|SNF
10/29/2020|GREEN HOUSE LIVING|2311 SHIRLEY COVE|ALF
11/21/2020|MISSION AT THE VILLA|1445 UINTA|ALF
11/2/2020|MISSION AT THE VILLA|1445 UINTA DRIVE|ALF
;
data want;
do _n_=1 by 1 until(last.unitname);
set to_clean;
by unitname group notsorted;
if first.group then n=sum(n,1);
end;
if n=1 then output;
else do _n_=1 to _n_;
set to_clean;
output;
end;
drop n;
run;
I've restated your objective to "keep the last observation (highest subdate) for each unitname/group combination). If that's correct, then this application of proc summary and a subsequent use of a hash object will do, which prunes the dataset while preserving the original order of the data (in case that's important). It requires only 2 passes through the data:
data to_clean;
infile cards dlm='|' truncover ;
input subDate :mmddyy10. unitName :$100. ADDR1 :$100. group $4.;
format subDate yymmdd10.;
cards;
11/21/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN|TNF
10/30/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN STREET|TNF
10/30/2020|BIG HORN ENTERPRISE|146 S. BENT STREET|SNF
10/30/2020|BIG HORN ENTERPRISE|641 WARREN STREET|SNF
11/5/2020|BROOKDALE |tt|ALF
10/29/2020|BROOKDALE|2401 COUGAR AVENUE|ALF
10/30/2020|ELMCROFT|1551 SUGARLAND DRIVE|ALF
11/2/2020|ELMCROFT|1551 SUGARLAND DRIVE DRIVE|SNF
11/21/2020|GREEN HOUSE LIVING|2311 SHIRLEY|SNF
10/29/2020|GREEN HOUSE LIVING|2311 SHIRLEY COVE|ALF
11/21/2020|MISSION AT THE VILLA|1445 UINTA|ALF
11/2/2020|MISSION AT THE VILLA|1445 UINTA DRIVE|ALF
;
proc summary data=to_clean nway;
class unitname group;
var subdate;
output out=need (drop=_:) max=subdate;
run;
data want;
set to_clean;
if _n_=1 then do;
declare hash h (dataset:'need');
h.definekey(all:'Y');
h.definedone();
end;
if h.check()=0;
run;
The proc summary (because of the NWAY option) outputs a dataset with one observation for each unitname/group combination. It records the maximum value of the analysis variable subdate for each combo, thereby providing the desired unitname/group/subdate values.
Stick that dataset in the hash object h, and check each incoming obs from to_clean against h, and keep only those whose key values (unitname,group,subdate) are found in h (i.e. "if h.check()=0").
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.