BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Stalk
Pyrite | Level 9

This is extension to my previous question.

I would like to output last. if the group value is same.

If the group value is different I want to retain both records.(keep both Bold records below)

 

data to_clean;
infile cards dlm='|' truncover ;
input subDate :mmddyy10. unitName :$100. ADDR1 :$100. group $4.;
format subDate yymmdd10.;
cards;
11/21/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN|TNF
10/30/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN STREET|TNF
10/30/2020|BIG HORN ENTERPRISE|146 S. BENT STREET|SNF
10/30/2020|BIG HORN ENTERPRISE|641 WARREN STREET|SNF
11/5/2020|BROOKDALE |tt|ALF
10/29/2020|BROOKDALE|2401 COUGAR AVENUE|ALF
10/30/2020|ELMCROFT|1551 SUGARLAND DRIVE|ALF
11/2/2020|ELMCROFT|1551 SUGARLAND DRIVE DRIVE|SNF
11/21/2020|GREEN HOUSE LIVING|2311 SHIRLEY|SNF
10/29/2020|GREEN HOUSE LIVING|2311 SHIRLEY COVE|ALF
11/21/2020|MISSION AT THE VILLA|1445 UINTA|ALF
11/2/2020|MISSION AT THE VILLA|1445 UINTA DRIVE|ALF
;

proc sort data=to_clean; by unitName subDate; run;

proc sql;
create table to_clean_new as
select * from to_clean
order by unitName, subDate, group;
quit;


data to_clean_2;
set to_clean_new;
by unitName;
length goodAddr $100;
retain goodAddr;
if first.unitName then goodAddr = addr1;
else if complev(trim(goodAddr), trim(addr1), "IL:") > 2 then goodAddr = addr1;
run;

proc sort data=to_clean_2; by unitName goodAddr subDate group; run;

data want;
set to_clean_2;
by unitName goodAddr;
if last.goodAddr;
run;

proc print noobs data=want; run;

****

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

I've restated your objective to "keep the last observation (highest subdate) for each unitname/group combination).  If that's correct, then this application of proc summary and a subsequent use of a hash object will do, which prunes the dataset while preserving the original order of the data (in case that's important).  It requires only 2 passes through the data:

 

data to_clean;
infile cards dlm='|' truncover ;
input subDate :mmddyy10. unitName :$100. ADDR1 :$100. group $4.;
format subDate yymmdd10.;
cards;
11/21/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN|TNF
10/30/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN STREET|TNF
10/30/2020|BIG HORN ENTERPRISE|146 S. BENT STREET|SNF
10/30/2020|BIG HORN ENTERPRISE|641 WARREN STREET|SNF
11/5/2020|BROOKDALE |tt|ALF
10/29/2020|BROOKDALE|2401 COUGAR AVENUE|ALF
10/30/2020|ELMCROFT|1551 SUGARLAND DRIVE|ALF
11/2/2020|ELMCROFT|1551 SUGARLAND DRIVE DRIVE|SNF
11/21/2020|GREEN HOUSE LIVING|2311 SHIRLEY|SNF
10/29/2020|GREEN HOUSE LIVING|2311 SHIRLEY COVE|ALF
11/21/2020|MISSION AT THE VILLA|1445 UINTA|ALF
11/2/2020|MISSION AT THE VILLA|1445 UINTA DRIVE|ALF
;

proc summary data=to_clean nway;
  class unitname group;
  var subdate;
  output out=need (drop=_:) max=subdate;
run;

data want;
  set to_clean;
  if _n_=1 then do;
    declare hash h (dataset:'need');
	  h.definekey(all:'Y');
	  h.definedone();
  end;
  if h.check()=0;
run;

The proc summary (because of the NWAY option) outputs a dataset with one observation for each unitname/group combination.  It records the maximum value of the analysis variable subdate for each combo, thereby providing the desired unitname/group/subdate values.   

 

Stick that dataset in the hash object h, and check each incoming obs from to_clean against h, and keep only those whose key values (unitname,group,subdate) are found in h (i.e. "if h.check()=0").

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
novinosrin
Tourmaline | Level 20

Hi @Stalk  I'm not sure of your expected output. However, rather than looking at your code, I'm taking a guess with your bold lines and your description-

"I would like to output last. if the group value is same.

If the group value is different I want to retain both records.(keep both Bold records below)"

 



data to_clean;
infile cards dlm='|' truncover ;
input subDate :mmddyy10. unitName :$100. ADDR1 :$100. group $4.;
format subDate yymmdd10.;
cards;
11/21/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN|TNF
10/30/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN STREET|TNF
10/30/2020|BIG HORN ENTERPRISE|146 S. BENT STREET|SNF
10/30/2020|BIG HORN ENTERPRISE|641 WARREN STREET|SNF
11/5/2020|BROOKDALE |tt|ALF
10/29/2020|BROOKDALE|2401 COUGAR AVENUE|ALF
10/30/2020|ELMCROFT|1551 SUGARLAND DRIVE|ALF
11/2/2020|ELMCROFT|1551 SUGARLAND DRIVE DRIVE|SNF
11/21/2020|GREEN HOUSE LIVING|2311 SHIRLEY|SNF
10/29/2020|GREEN HOUSE LIVING|2311 SHIRLEY COVE|ALF
11/21/2020|MISSION AT THE VILLA|1445 UINTA|ALF
11/2/2020|MISSION AT THE VILLA|1445 UINTA DRIVE|ALF
;

data want;
 do _n_=1 by 1 until(last.unitname);
  set to_clean;
  by unitname group notsorted;
  if first.group then n=sum(n,1);
 end;
 if n=1 then output;
 else do _n_=1 to _n_;
  set to_clean;
  output;
 end;
 drop n;
run;

 

Stalk
Pyrite | Level 9
Records with different group are not captured in the output dataset. From the data above I would like to get 1,3,4,5, 7,8,9,10 observations( eliminate 2 , 6 and 11).
mkeintz
PROC Star

I've restated your objective to "keep the last observation (highest subdate) for each unitname/group combination).  If that's correct, then this application of proc summary and a subsequent use of a hash object will do, which prunes the dataset while preserving the original order of the data (in case that's important).  It requires only 2 passes through the data:

 

data to_clean;
infile cards dlm='|' truncover ;
input subDate :mmddyy10. unitName :$100. ADDR1 :$100. group $4.;
format subDate yymmdd10.;
cards;
11/21/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN|TNF
10/30/2020|BAIRD HUDSON ENTERPRISES|106 E MAIN STREET|TNF
10/30/2020|BIG HORN ENTERPRISE|146 S. BENT STREET|SNF
10/30/2020|BIG HORN ENTERPRISE|641 WARREN STREET|SNF
11/5/2020|BROOKDALE |tt|ALF
10/29/2020|BROOKDALE|2401 COUGAR AVENUE|ALF
10/30/2020|ELMCROFT|1551 SUGARLAND DRIVE|ALF
11/2/2020|ELMCROFT|1551 SUGARLAND DRIVE DRIVE|SNF
11/21/2020|GREEN HOUSE LIVING|2311 SHIRLEY|SNF
10/29/2020|GREEN HOUSE LIVING|2311 SHIRLEY COVE|ALF
11/21/2020|MISSION AT THE VILLA|1445 UINTA|ALF
11/2/2020|MISSION AT THE VILLA|1445 UINTA DRIVE|ALF
;

proc summary data=to_clean nway;
  class unitname group;
  var subdate;
  output out=need (drop=_:) max=subdate;
run;

data want;
  set to_clean;
  if _n_=1 then do;
    declare hash h (dataset:'need');
	  h.definekey(all:'Y');
	  h.definedone();
  end;
  if h.check()=0;
run;

The proc summary (because of the NWAY option) outputs a dataset with one observation for each unitname/group combination.  It records the maximum value of the analysis variable subdate for each combo, thereby providing the desired unitname/group/subdate values.   

 

Stick that dataset in the hash object h, and check each incoming obs from to_clean against h, and keep only those whose key values (unitname,group,subdate) are found in h (i.e. "if h.check()=0").

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Stalk
Pyrite | Level 9
Thank you mkeintz for suggesting Hash object. I really don't understand how hash object works so avoid using that. But I got my program working for the desired results with simple fixxes to sort..

proc sort data=to_clean_2; by unitName goodAddr group descending subDate ; run;

data new;
set to_clean_2;
by unitname goodaddr group;
if first.group then output;
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 810 views
  • 0 likes
  • 3 in conversation