BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aaronh
Quartz | Level 8

Hello experts,

 

Suppose I have a table like this:

 

data have;
input fruit$ attrib$;
datalines;
apple
apple sweet
lemon sour
banana
banana
orange citrus
orange citrus
;

So in this dataset, a fruit can sometimes have attribute that is complete (like orange), empty (like banana), or somewhere in between (like apple). 

 

The challenge here is that I want to condense the rows into 4 rows, such that if the attribute for fruit X is provided in any row, I will keep that attribute for fruit X. If no attribute is provided for fruit X, I will leave it as is. So the resulting dataset will look like:

 

apple sweet

lemon sour

banana

orange citrus

 

Any ideas will be appreciated.

Thank you,

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20



data have;
input fruit$ attrib$;
datalines;
apple .
apple sweet
lemon sour
banana .
banana .
orange citrus
orange citrus
;

proc sql;
create table want as
select distinct a.*
from have a left join have b
on find(cats(a.fruit,a.attrib), strip(b.fruit))>0
group by a.fruit
having max(length(cats(a.fruit,a.attrib))) =length(cats(a.fruit,a.attrib));
quit;

View solution in original post

8 REPLIES 8
tomrvincent
Rhodochrosite | Level 12
do a distinct by fruit and another distinct by fruit and attrib. Join the 2 together.
Reeza
Super User

Use Sort. 

 

proc sort data=have;
by fruit descending attrib;
run;

proc sort data=have out=want nodupkey;
by fruit;
run;

If you have fruits with different attributes you may run into some issues. 

 


@aaronh wrote:

Hello experts,

 

Suppose I have a table like this:

 

data have;
input fruit$ attrib$;
datalines;
apple
apple sweet
lemon sour
banana
banana
orange citrus
orange citrus
;

So in this dataset, a fruit can sometimes have attribute that is complete (like orange), empty (like banana), or somewhere in between (like apple). 

 

The challenge here is that I want to condense the rows into 4 rows, such that if the attribute for fruit X is provided in any row, I will keep that attribute for fruit X. If no attribute is provided for fruit X, I will leave it as is. So the resulting dataset will look like:

 

apple sweet

lemon sour

banana

orange citrus

 

Any ideas will be appreciated.

Thank you,


 

aaronh
Quartz | Level 8
Thanks Reeza! I was thinking of maybe using a sort first and then using a by group in a data step to choose the first/last in the by group. Which is somewhat similar I guess to what you have
novinosrin
Tourmaline | Level 20



data have;
input fruit$ attrib$;
datalines;
apple .
apple sweet
lemon sour
banana .
banana .
orange citrus
orange citrus
;

proc sql;
create table want as
select distinct a.*
from have a left join have b
on find(cats(a.fruit,a.attrib), strip(b.fruit))>0
group by a.fruit
having max(length(cats(a.fruit,a.attrib))) =length(cats(a.fruit,a.attrib));
quit;

PGStats
Opal | Level 21

Using a hash:

 

data _null_;
set have end=done;
if _n_ = 1 then do;
	declare hash h();
	rc = h.definekey("fruit");
	rc = h.definedata("fruit", "attrib");
	rc = h.definedone();
	end;
if h.check() ne 0 then h.add();
else if not missing(attrib) then h.replace();
if done then h.output(dataset : "want");
run;

It will keep the last attribute encountered for a given fruit.

PG
hashman
Ammonite | Level 13

@aaronh:

Hm. If you real data are patterned the same way you've presented in your data sample, i.e. grouped by [fruit,attrib] with the missing values for attrib coming first, the only thing you need to do is:

data have ;                
  input fruit $ attrib $ ; 
  cards ;                  
apple  .                   
apple  sweet               
lemon  sour                
banana .                   
banana .                   
orange citrus              
orange citrus              
;                          
run ;                      
                           
data want ;                
  set have ;               
  by fruit notsorted ;     
  if last.fruit ;          
run ;                      

If the real input data aren't grouped or sorted and you don't want to sort, just use what @PGStats or @novinosrin have offered.

 

Kind regards

Paul D.  

    

aaronh
Quartz | Level 8

Thank you all very much for your insightful solutions! 

 

@hashman Yes, my real data is actually a bit more complex because I am trying to select the more comprehensive set of addresses (street address 1 and street address 2) for a by-group that consists of a few variables. It seems like using by-group in a data step would be the easiest way to comprehend or trouble-shoot.

 

@PGStats and @novinosrin : thank you both for the solutions! At first glance, it seems like a simple logic to a human mind, but to implement it in SAS, it can be much more complicated. 

 

Once again, I really appreciate the inputs from you all!

tomrvincent
Rhodochrosite | Level 12
PROC SQL;
	CREATE TABLE WORK.foo AS 
		SELECT f.fruit, 
			a.attrib
		FROM (SELECT DISTINCT fruit FROM USER.HAVE) f
			FULL JOIN (SELECT DISTINCT fruit, attrib FROM USER.HAVE WHERE attrib NOT IS MISSING) a 
				ON f.fruit = a.fruit;
QUIT;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 962 views
  • 10 likes
  • 6 in conversation