BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Stalk
Pyrite | Level 9

This is extension to my previous question.

I would like to output last. if the group value is same.

If the group value is different I want to retain both records.(keep both Bold records below)

 

data to_clean;
infile cards dlm='|' truncover ;
input subDate :mmddyy10. unitName :$100. ADDR1 :$100. group $4.;
format subDate yymmdd10.;
cards;
11/21/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN|TNF
10/30/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN STREET|TNF
10/30/2020|BIG HORN ENTERPRISE|146 S. BENT STREET|SNF
10/30/2020|BIG HORN ENTERPRISE|641 WARREN STREET|SNF
11/5/2020|BROOKDALE |tt|ALF
10/29/2020|BROOKDALE|2401 COUGAR AVENUE|ALF
10/30/2020|ELMCROFT|1551 SUGARLAND DRIVE|ALF
11/2/2020|ELMCROFT|1551 SUGARLAND DRIVE DRIVE|SNF
11/21/2020|GREEN HOUSE LIVING|2311 SHIRLEY|SNF
10/29/2020|GREEN HOUSE LIVING|2311 SHIRLEY COVE|ALF
11/21/2020|MISSION AT THE VILLA|1445 UINTA|ALF
11/2/2020|MISSION AT THE VILLA|1445 UINTA DRIVE|ALF
;

proc sort data=to_clean; by unitName subDate; run;

proc sql;
create table to_clean_new as
select * from to_clean
order by unitName, subDate, group;
quit;


data to_clean_2;
set to_clean_new;
by unitName;
length goodAddr $100;
retain goodAddr;
if first.unitName then goodAddr = addr1;
else if complev(trim(goodAddr), trim(addr1), "IL:") > 2 then goodAddr = addr1;
run;

proc sort data=to_clean_2; by unitName goodAddr subDate group; run;

data want;
set to_clean_2;
by unitName goodAddr;
if last.goodAddr;
run;

proc print noobs data=want; run;

****

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

I've restated your objective to "keep the last observation (highest subdate) for each unitname/group combination).  If that's correct, then this application of proc summary and a subsequent use of a hash object will do, which prunes the dataset while preserving the original order of the data (in case that's important).  It requires only 2 passes through the data:

 

data to_clean;
infile cards dlm='|' truncover ;
input subDate :mmddyy10. unitName :$100. ADDR1 :$100. group $4.;
format subDate yymmdd10.;
cards;
11/21/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN|TNF
10/30/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN STREET|TNF
10/30/2020|BIG HORN ENTERPRISE|146 S. BENT STREET|SNF
10/30/2020|BIG HORN ENTERPRISE|641 WARREN STREET|SNF
11/5/2020|BROOKDALE |tt|ALF
10/29/2020|BROOKDALE|2401 COUGAR AVENUE|ALF
10/30/2020|ELMCROFT|1551 SUGARLAND DRIVE|ALF
11/2/2020|ELMCROFT|1551 SUGARLAND DRIVE DRIVE|SNF
11/21/2020|GREEN HOUSE LIVING|2311 SHIRLEY|SNF
10/29/2020|GREEN HOUSE LIVING|2311 SHIRLEY COVE|ALF
11/21/2020|MISSION AT THE VILLA|1445 UINTA|ALF
11/2/2020|MISSION AT THE VILLA|1445 UINTA DRIVE|ALF
;

proc summary data=to_clean nway;
  class unitname group;
  var subdate;
  output out=need (drop=_:) max=subdate;
run;

data want;
  set to_clean;
  if _n_=1 then do;
    declare hash h (dataset:'need');
	  h.definekey(all:'Y');
	  h.definedone();
  end;
  if h.check()=0;
run;

The proc summary (because of the NWAY option) outputs a dataset with one observation for each unitname/group combination.  It records the maximum value of the analysis variable subdate for each combo, thereby providing the desired unitname/group/subdate values.   

 

Stick that dataset in the hash object h, and check each incoming obs from to_clean against h, and keep only those whose key values (unitname,group,subdate) are found in h (i.e. "if h.check()=0").

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
novinosrin
Tourmaline | Level 20

Hi @Stalk  I'm not sure of your expected output. However, rather than looking at your code, I'm taking a guess with your bold lines and your description-

"I would like to output last. if the group value is same.

If the group value is different I want to retain both records.(keep both Bold records below)"

 



data to_clean;
infile cards dlm='|' truncover ;
input subDate :mmddyy10. unitName :$100. ADDR1 :$100. group $4.;
format subDate yymmdd10.;
cards;
11/21/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN|TNF
10/30/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN STREET|TNF
10/30/2020|BIG HORN ENTERPRISE|146 S. BENT STREET|SNF
10/30/2020|BIG HORN ENTERPRISE|641 WARREN STREET|SNF
11/5/2020|BROOKDALE |tt|ALF
10/29/2020|BROOKDALE|2401 COUGAR AVENUE|ALF
10/30/2020|ELMCROFT|1551 SUGARLAND DRIVE|ALF
11/2/2020|ELMCROFT|1551 SUGARLAND DRIVE DRIVE|SNF
11/21/2020|GREEN HOUSE LIVING|2311 SHIRLEY|SNF
10/29/2020|GREEN HOUSE LIVING|2311 SHIRLEY COVE|ALF
11/21/2020|MISSION AT THE VILLA|1445 UINTA|ALF
11/2/2020|MISSION AT THE VILLA|1445 UINTA DRIVE|ALF
;

data want;
 do _n_=1 by 1 until(last.unitname);
  set to_clean;
  by unitname group notsorted;
  if first.group then n=sum(n,1);
 end;
 if n=1 then output;
 else do _n_=1 to _n_;
  set to_clean;
  output;
 end;
 drop n;
run;

 

Stalk
Pyrite | Level 9
Records with different group are not captured in the output dataset. From the data above I would like to get 1,3,4,5, 7,8,9,10 observations( eliminate 2 , 6 and 11).
mkeintz
PROC Star

I've restated your objective to "keep the last observation (highest subdate) for each unitname/group combination).  If that's correct, then this application of proc summary and a subsequent use of a hash object will do, which prunes the dataset while preserving the original order of the data (in case that's important).  It requires only 2 passes through the data:

 

data to_clean;
infile cards dlm='|' truncover ;
input subDate :mmddyy10. unitName :$100. ADDR1 :$100. group $4.;
format subDate yymmdd10.;
cards;
11/21/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN|TNF
10/30/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN STREET|TNF
10/30/2020|BIG HORN ENTERPRISE|146 S. BENT STREET|SNF
10/30/2020|BIG HORN ENTERPRISE|641 WARREN STREET|SNF
11/5/2020|BROOKDALE |tt|ALF
10/29/2020|BROOKDALE|2401 COUGAR AVENUE|ALF
10/30/2020|ELMCROFT|1551 SUGARLAND DRIVE|ALF
11/2/2020|ELMCROFT|1551 SUGARLAND DRIVE DRIVE|SNF
11/21/2020|GREEN HOUSE LIVING|2311 SHIRLEY|SNF
10/29/2020|GREEN HOUSE LIVING|2311 SHIRLEY COVE|ALF
11/21/2020|MISSION AT THE VILLA|1445 UINTA|ALF
11/2/2020|MISSION AT THE VILLA|1445 UINTA DRIVE|ALF
;

proc summary data=to_clean nway;
  class unitname group;
  var subdate;
  output out=need (drop=_:) max=subdate;
run;

data want;
  set to_clean;
  if _n_=1 then do;
    declare hash h (dataset:'need');
	  h.definekey(all:'Y');
	  h.definedone();
  end;
  if h.check()=0;
run;

The proc summary (because of the NWAY option) outputs a dataset with one observation for each unitname/group combination.  It records the maximum value of the analysis variable subdate for each combo, thereby providing the desired unitname/group/subdate values.   

 

Stick that dataset in the hash object h, and check each incoming obs from to_clean against h, and keep only those whose key values (unitname,group,subdate) are found in h (i.e. "if h.check()=0").

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Stalk
Pyrite | Level 9
Thank you mkeintz for suggesting Hash object. I really don't understand how hash object works so avoid using that. But I got my program working for the desired results with simple fixxes to sort..

proc sort data=to_clean_2; by unitName goodAddr group descending subDate ; run;

data new;
set to_clean_2;
by unitname goodaddr group;
if first.group then output;
run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1292 views
  • 0 likes
  • 3 in conversation