Below is my input data
data have; input Col1 $ Col2 Col3; datalines; A 1 2 A 3 4 A 4 8 A 6 5 B 1 . B 2 . B 4 . B 5 . ;
Now I would like to fill the Col3 data for B, based on the matched Col2 for A. If any Col2 data for B are missing from A, either the max or min of Col2 values from entire records for A will be used.
Hence, the output of B should be:
B 1 2
B 2 8 for max or 2 for min
B 4 8
B 5 8 for max or 2 for min
Many thanks for your help.
data have;
input Col1 $ Col2 Col3;
datalines;
A 1 2
A 3 4
A 4 8
A 6 5
B 1 .
B 2 .
B 4 .
B 5 .
;
proc sql;
create table want as
select Col1,Col2,coalesce(
(select Col3 from have where col1='A' and col2=a.col2),
(select max(Col3) from have where col1='A')
) as Col3
from have as a;
quit;
Find max/min grouped by data fields (I will call these fields FIELD1 and FIELD2)
proc summary data=have nway;
class field1 field2;
var yourvarname;
output out=want min= max=/autoname;
run;
Replace missing data with them ... more explanation is needed. Missing data can be replaced by one value, but missing data cannot be replaced by "them" (which indicates to me replace the missing with both the max and the min).
In the future, please give a more complete explanation of the problem, rather than just the bare minimum that you have typed in this case.
Is there a question here?
What is your input data?
What output do you want for that particular input?
What missing data do you want to replace? What are the rules for determining what to replace it with?
Thank you for providing a working DATA step with sample data. You haven't provided a rule for determining when to replace missing values of col3 with the max value vs the min, so this code always assigns the max.
data want (drop=_:);
set have;
if _n_=1 then do;
declare hash h ();
h.definekey('col2');
h.definedata('col3');
h.definedone();
end;
if col1='A' then do;
h.add();
_min=min(_min,col3);
_max=max(_max,col3);
retain _min _max;
end;
else if h.find() ^=0 then col3=_max;
run;
It assumes that all the COL='A' observations are at the beginning of the data set, and that all other obs need to use the data from 'A' to determine col3.
data have;
input Col1 $ Col2 Col3;
datalines;
A 1 2
A 3 4
A 4 8
A 6 5
B 1 .
B 2 .
B 4 .
B 5 .
;
proc sql;
create table want as
select Col1,Col2,coalesce(
(select Col3 from have where col1='A' and col2=a.col2),
(select max(Col3) from have where col1='A')
) as Col3
from have as a;
quit;
Thanks both, which all work for me.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.