Hello experts,
Suppose I have a table like this:
data have;
input fruit$ attrib$;
datalines;
apple
apple sweet
lemon sour
banana
banana
orange citrus
orange citrus
;
So in this dataset, a fruit can sometimes have attribute that is complete (like orange), empty (like banana), or somewhere in between (like apple).
The challenge here is that I want to condense the rows into 4 rows, such that if the attribute for fruit X is provided in any row, I will keep that attribute for fruit X. If no attribute is provided for fruit X, I will leave it as is. So the resulting dataset will look like:
apple sweet
lemon sour
banana
orange citrus
Any ideas will be appreciated.
Thank you,
data have;
input fruit$ attrib$;
datalines;
apple .
apple sweet
lemon sour
banana .
banana .
orange citrus
orange citrus
;
proc sql;
create table want as
select distinct a.*
from have a left join have b
on find(cats(a.fruit,a.attrib), strip(b.fruit))>0
group by a.fruit
having max(length(cats(a.fruit,a.attrib))) =length(cats(a.fruit,a.attrib));
quit;
Use Sort.
proc sort data=have;
by fruit descending attrib;
run;
proc sort data=have out=want nodupkey;
by fruit;
run;
If you have fruits with different attributes you may run into some issues.
@aaronh wrote:
Hello experts,
Suppose I have a table like this:
data have; input fruit$ attrib$; datalines; apple apple sweet lemon sour banana banana orange citrus orange citrus ;
So in this dataset, a fruit can sometimes have attribute that is complete (like orange), empty (like banana), or somewhere in between (like apple).
The challenge here is that I want to condense the rows into 4 rows, such that if the attribute for fruit X is provided in any row, I will keep that attribute for fruit X. If no attribute is provided for fruit X, I will leave it as is. So the resulting dataset will look like:
apple sweet
lemon sour
banana
orange citrus
Any ideas will be appreciated.
Thank you,
data have;
input fruit$ attrib$;
datalines;
apple .
apple sweet
lemon sour
banana .
banana .
orange citrus
orange citrus
;
proc sql;
create table want as
select distinct a.*
from have a left join have b
on find(cats(a.fruit,a.attrib), strip(b.fruit))>0
group by a.fruit
having max(length(cats(a.fruit,a.attrib))) =length(cats(a.fruit,a.attrib));
quit;
Using a hash:
data _null_;
set have end=done;
if _n_ = 1 then do;
declare hash h();
rc = h.definekey("fruit");
rc = h.definedata("fruit", "attrib");
rc = h.definedone();
end;
if h.check() ne 0 then h.add();
else if not missing(attrib) then h.replace();
if done then h.output(dataset : "want");
run;
It will keep the last attribute encountered for a given fruit.
Hm. If you real data are patterned the same way you've presented in your data sample, i.e. grouped by [fruit,attrib] with the missing values for attrib coming first, the only thing you need to do is:
data have ;
input fruit $ attrib $ ;
cards ;
apple .
apple sweet
lemon sour
banana .
banana .
orange citrus
orange citrus
;
run ;
data want ;
set have ;
by fruit notsorted ;
if last.fruit ;
run ;
If the real input data aren't grouped or sorted and you don't want to sort, just use what @PGStats or @novinosrin have offered.
Kind regards
Paul D.
Thank you all very much for your insightful solutions!
@hashman Yes, my real data is actually a bit more complex because I am trying to select the more comprehensive set of addresses (street address 1 and street address 2) for a by-group that consists of a few variables. It seems like using by-group in a data step would be the easiest way to comprehend or trouble-shoot.
@PGStats and @novinosrin : thank you both for the solutions! At first glance, it seems like a simple logic to a human mind, but to implement it in SAS, it can be much more complicated.
Once again, I really appreciate the inputs from you all!
PROC SQL;
CREATE TABLE WORK.foo AS
SELECT f.fruit,
a.attrib
FROM (SELECT DISTINCT fruit FROM USER.HAVE) f
FULL JOIN (SELECT DISTINCT fruit, attrib FROM USER.HAVE WHERE attrib NOT IS MISSING) a
ON f.fruit = a.fruit;
QUIT;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.