BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aaronh
Quartz | Level 8

Hello experts,

 

Suppose I have a table like this:

 

data have;
input fruit$ attrib$;
datalines;
apple
apple sweet
lemon sour
banana
banana
orange citrus
orange citrus
;

So in this dataset, a fruit can sometimes have attribute that is complete (like orange), empty (like banana), or somewhere in between (like apple). 

 

The challenge here is that I want to condense the rows into 4 rows, such that if the attribute for fruit X is provided in any row, I will keep that attribute for fruit X. If no attribute is provided for fruit X, I will leave it as is. So the resulting dataset will look like:

 

apple sweet

lemon sour

banana

orange citrus

 

Any ideas will be appreciated.

Thank you,

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20



data have;
input fruit$ attrib$;
datalines;
apple .
apple sweet
lemon sour
banana .
banana .
orange citrus
orange citrus
;

proc sql;
create table want as
select distinct a.*
from have a left join have b
on find(cats(a.fruit,a.attrib), strip(b.fruit))>0
group by a.fruit
having max(length(cats(a.fruit,a.attrib))) =length(cats(a.fruit,a.attrib));
quit;

View solution in original post

8 REPLIES 8
tomrvincent
Rhodochrosite | Level 12
do a distinct by fruit and another distinct by fruit and attrib. Join the 2 together.
Reeza
Super User

Use Sort. 

 

proc sort data=have;
by fruit descending attrib;
run;

proc sort data=have out=want nodupkey;
by fruit;
run;

If you have fruits with different attributes you may run into some issues. 

 


@aaronh wrote:

Hello experts,

 

Suppose I have a table like this:

 

data have;
input fruit$ attrib$;
datalines;
apple
apple sweet
lemon sour
banana
banana
orange citrus
orange citrus
;

So in this dataset, a fruit can sometimes have attribute that is complete (like orange), empty (like banana), or somewhere in between (like apple). 

 

The challenge here is that I want to condense the rows into 4 rows, such that if the attribute for fruit X is provided in any row, I will keep that attribute for fruit X. If no attribute is provided for fruit X, I will leave it as is. So the resulting dataset will look like:

 

apple sweet

lemon sour

banana

orange citrus

 

Any ideas will be appreciated.

Thank you,


 

aaronh
Quartz | Level 8
Thanks Reeza! I was thinking of maybe using a sort first and then using a by group in a data step to choose the first/last in the by group. Which is somewhat similar I guess to what you have
novinosrin
Tourmaline | Level 20



data have;
input fruit$ attrib$;
datalines;
apple .
apple sweet
lemon sour
banana .
banana .
orange citrus
orange citrus
;

proc sql;
create table want as
select distinct a.*
from have a left join have b
on find(cats(a.fruit,a.attrib), strip(b.fruit))>0
group by a.fruit
having max(length(cats(a.fruit,a.attrib))) =length(cats(a.fruit,a.attrib));
quit;

PGStats
Opal | Level 21

Using a hash:

 

data _null_;
set have end=done;
if _n_ = 1 then do;
	declare hash h();
	rc = h.definekey("fruit");
	rc = h.definedata("fruit", "attrib");
	rc = h.definedone();
	end;
if h.check() ne 0 then h.add();
else if not missing(attrib) then h.replace();
if done then h.output(dataset : "want");
run;

It will keep the last attribute encountered for a given fruit.

PG
hashman
Ammonite | Level 13

@aaronh:

Hm. If you real data are patterned the same way you've presented in your data sample, i.e. grouped by [fruit,attrib] with the missing values for attrib coming first, the only thing you need to do is:

data have ;                
  input fruit $ attrib $ ; 
  cards ;                  
apple  .                   
apple  sweet               
lemon  sour                
banana .                   
banana .                   
orange citrus              
orange citrus              
;                          
run ;                      
                           
data want ;                
  set have ;               
  by fruit notsorted ;     
  if last.fruit ;          
run ;                      

If the real input data aren't grouped or sorted and you don't want to sort, just use what @PGStats or @novinosrin have offered.

 

Kind regards

Paul D.  

    

aaronh
Quartz | Level 8

Thank you all very much for your insightful solutions! 

 

@hashman Yes, my real data is actually a bit more complex because I am trying to select the more comprehensive set of addresses (street address 1 and street address 2) for a by-group that consists of a few variables. It seems like using by-group in a data step would be the easiest way to comprehend or trouble-shoot.

 

@PGStats and @novinosrin : thank you both for the solutions! At first glance, it seems like a simple logic to a human mind, but to implement it in SAS, it can be much more complicated. 

 

Once again, I really appreciate the inputs from you all!

tomrvincent
Rhodochrosite | Level 12
PROC SQL;
	CREATE TABLE WORK.foo AS 
		SELECT f.fruit, 
			a.attrib
		FROM (SELECT DISTINCT fruit FROM USER.HAVE) f
			FULL JOIN (SELECT DISTINCT fruit, attrib FROM USER.HAVE WHERE attrib NOT IS MISSING) a 
				ON f.fruit = a.fruit;
QUIT;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1621 views
  • 10 likes
  • 6 in conversation