Hello experts,
Suppose I have a table like this:
data have;
input fruit$ attrib$;
datalines;
apple
apple sweet
lemon sour
banana
banana
orange citrus
orange citrus
;
So in this dataset, a fruit can sometimes have attribute that is complete (like orange), empty (like banana), or somewhere in between (like apple).
The challenge here is that I want to condense the rows into 4 rows, such that if the attribute for fruit X is provided in any row, I will keep that attribute for fruit X. If no attribute is provided for fruit X, I will leave it as is. So the resulting dataset will look like:
apple sweet
lemon sour
banana
orange citrus
Any ideas will be appreciated.
Thank you,
data have;
input fruit$ attrib$;
datalines;
apple .
apple sweet
lemon sour
banana .
banana .
orange citrus
orange citrus
;
proc sql;
create table want as
select distinct a.*
from have a left join have b
on find(cats(a.fruit,a.attrib), strip(b.fruit))>0
group by a.fruit
having max(length(cats(a.fruit,a.attrib))) =length(cats(a.fruit,a.attrib));
quit;
Use Sort.
proc sort data=have;
by fruit descending attrib;
run;
proc sort data=have out=want nodupkey;
by fruit;
run;
If you have fruits with different attributes you may run into some issues.
@aaronh wrote:
Hello experts,
Suppose I have a table like this:
data have; input fruit$ attrib$; datalines; apple apple sweet lemon sour banana banana orange citrus orange citrus ;
So in this dataset, a fruit can sometimes have attribute that is complete (like orange), empty (like banana), or somewhere in between (like apple).
The challenge here is that I want to condense the rows into 4 rows, such that if the attribute for fruit X is provided in any row, I will keep that attribute for fruit X. If no attribute is provided for fruit X, I will leave it as is. So the resulting dataset will look like:
apple sweet
lemon sour
banana
orange citrus
Any ideas will be appreciated.
Thank you,
data have;
input fruit$ attrib$;
datalines;
apple .
apple sweet
lemon sour
banana .
banana .
orange citrus
orange citrus
;
proc sql;
create table want as
select distinct a.*
from have a left join have b
on find(cats(a.fruit,a.attrib), strip(b.fruit))>0
group by a.fruit
having max(length(cats(a.fruit,a.attrib))) =length(cats(a.fruit,a.attrib));
quit;
Using a hash:
data _null_;
set have end=done;
if _n_ = 1 then do;
declare hash h();
rc = h.definekey("fruit");
rc = h.definedata("fruit", "attrib");
rc = h.definedone();
end;
if h.check() ne 0 then h.add();
else if not missing(attrib) then h.replace();
if done then h.output(dataset : "want");
run;
It will keep the last attribute encountered for a given fruit.
Hm. If you real data are patterned the same way you've presented in your data sample, i.e. grouped by [fruit,attrib] with the missing values for attrib coming first, the only thing you need to do is:
data have ;
input fruit $ attrib $ ;
cards ;
apple .
apple sweet
lemon sour
banana .
banana .
orange citrus
orange citrus
;
run ;
data want ;
set have ;
by fruit notsorted ;
if last.fruit ;
run ;
If the real input data aren't grouped or sorted and you don't want to sort, just use what @PGStats or @novinosrin have offered.
Kind regards
Paul D.
Thank you all very much for your insightful solutions!
@hashman Yes, my real data is actually a bit more complex because I am trying to select the more comprehensive set of addresses (street address 1 and street address 2) for a by-group that consists of a few variables. It seems like using by-group in a data step would be the easiest way to comprehend or trouble-shoot.
@PGStats and @novinosrin : thank you both for the solutions! At first glance, it seems like a simple logic to a human mind, but to implement it in SAS, it can be much more complicated.
Once again, I really appreciate the inputs from you all!
PROC SQL;
CREATE TABLE WORK.foo AS
SELECT f.fruit,
a.attrib
FROM (SELECT DISTINCT fruit FROM USER.HAVE) f
FULL JOIN (SELECT DISTINCT fruit, attrib FROM USER.HAVE WHERE attrib NOT IS MISSING) a
ON f.fruit = a.fruit;
QUIT;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.