I'm trying to find the maximum value from VALUE field group by on two fields (Id, CRF) as shown below. I'd would like to understand whether this program is logically correct. I'm not getting the desired results with this program. I don't wish to create a any new field after converting the VALUE from character to numeric.
Input Data:
Id CRF Value PLAP24MRT_MV [INS]#MAX1 -956461.285475 PLAP24MRT_MV [INS]#MAX2 -30254.271916
Desired Result:
Id CRF Value PLAP24MRT_MV [INS] -30254.271916
Program:
proc Sql; create table Want as select distinct * from ( select Id ,scan(CRF,1,'#') as CRF, ( case when scan(CRF,-1,,'ka')="MIN" then strip(put(min(input(Value,24.6)),24.6)) when scan(CRF,-1,,'ka')="MAX" then strip(put(max(input(Value,24.6)),24.6)) when scan(CRF,-1,,'ka')="AVG" then strip(put(avg(input(Value,24.6)),24.6)) when scan(CRF,-1,,'ka')="STD" then strip(put(std(input(Value,24.6)),24.6)) else '' end ) as Value from INPUT group by Id,CRF); quit;
Hello @David_Billa
Your program logic needs a small correction.
Your source data set has a variable 'CRF'.
In the Proc SQL you are recalculating the variable 'CRF' .
It is this recalculated 'CRF' you want to use.
However the use of 'CRF' in the group by statement cause the PROC SQL to use the original 'CRF' and not the newly calculated one.
Therefore use the calculated keyword before 'CRF' in the group by statement as shown below and you will get the result.
proc Sql;
create table Want as select distinct * from (
select Id
,scan(CRF,1,'#') as CRF, (
case
when scan(CRF,-1,,'ka')="MIN" then strip(put(min(input(Value,24.6)),24.6))
when scan(CRF,-1,,'ka')="MAX" then strip(put(max(input(Value,24.6)),24.6))
when scan(CRF,-1,,'ka')="AVG" then strip(put(avg(input(Value,24.6)),24.6))
when scan(CRF,-1,,'ka')="STD" then strip(put(std(input(Value,24.6)),24.6))
else ''
end )
as Value
from INPUT
group by Id,calculated CRF);
quit;
You will get the desired output as can be seen below
Make a small set of data where you know what the result should be. You should include some of those CRF values with "min","avg" and "std" to properly test your code.
Run your code.
Does it match your expectation?.
@ballardw Tried already. It's not producing the desired results. May I know whether the charcter to numeric and then to character conversion is correct in CASE WHEN as shown in initial post?
I don't use subqueries in Case statements but to select a maximum value by group I would expect to see something like
Proc sql; create table want as select <group vars>, max(numeric expression) from dataset group by <group vars> ; quit;
If you only want to consider values where CRF contains MAX that would be a WHERE subsetting the data set such as:
Proc sql; create table want as select <group vars>, Put(max(numeric expression),24.6) as value from (select * from dataset where scan(CRF,-1,,'ka')='MAX' ) group by <group vars> ; quit;
Or similar.
You want to test what your INPUT is creating. You may find that if you have integers the value is not as expected.
Consider:
data example; x="-956461"; y=input(x,24.6); run;
Look very closely for a decimal in the above. Leading spaces might make things even more entertaining. Input unless you are dealing with fixed implicit decimal places is typically not desired. An F24. informat (or 24.0) will read the numeric values just fine.
Of course one does wonder why a value like character to begin with.
As always, actual example data in the form of a data step might get something that works quicker.
Since your "example" of two lines had none of "avg" "min" or "std" it is real hard for us to test any code.
proc sql;
create table have
(id char(200),CRF char(200), value char(200));
insert into have
values('PLAP24MRT_MV','[INS]#MAX1','-956461.285475')
values('PLAP24MRT_MV', '[INS]#MAX2','-30254.271916');
quit;
data have2;
set have;
new_CRF=scan(CRF,1,'#');
val=input(value,24.6);
run;
proc sql;
create table have3 as
select distinct id, new_CRF,
case when scan(CRF,-1,,'ka')="MIN" then strip(put(min(val),26.4))
when scan(CRF,-1,,'ka')="MAX" then strip(put(max(val),26.4))
when scan(CRF,-1,,'ka')="AVG" then strip(put(mean(val),26.4))
when scan(CRF,-1,,'ka')="STD" then strip(put(std(val),26.4))
else '' end as value
from have2
group by id, new_CRF;
quit;
I don't see the issue? This gets -30254.271916 as the result.
@tarheel13 can we do this in two steps by removing have2 datastep? Also may I know why you're changing the format to 26.4? Can't we retain with 24.6?
The 24.6 was a mistake. But anyway I agree with @ballardw and I don't think you need to use subqueries either. You can put it in one SQL and remove the DATA step. The result is still the same.
proc sql;
create table have3 as
select distinct id, scan(CRF,1,'#') as CRF,
case when scan(CRF,-1,,'ka')="MIN" then strip(put(min(input(value,26.4)),26.4))
when scan(CRF,-1,,'ka')="MAX" then strip(put(max(input(value,26.4)),26.4))
when scan(CRF,-1,,'ka')="AVG" then strip(put(mean(input(value,26.4)),26.4))
when scan(CRF,-1,,'ka')="STD" then strip(put(std(input(value,26.4)),26.4))
else '' end as value
from have2
group by id, CRF;
quit;
@tarheel13 It's not producing the desired results. You can execute and check
proc sql; create table have (id char(200),CRF char(200), value char(200)); insert into have values('PLAP24MRT_MV','[INS]#MAX1','-956461.285475') values('PLAP24MRT_MV', '[INS]#MAX2','-30254.271916'); quit; proc sql; create table have3 as select distinct id, scan(CRF,1,'#') as CRF, case when scan(CRF,-1,,'ka')="MIN" then strip(put(min(input(value,26.4)),26.4)) when scan(CRF,-1,,'ka')="MAX" then strip(put(max(input(value,26.4)),26.4)) when scan(CRF,-1,,'ka')="AVG" then strip(put(mean(input(value,26.4)),26.4)) when scan(CRF,-1,,'ka')="STD" then strip(put(std(input(value,26.4)),26.4)) else '' end as value from have group by id, CRF; quit;
I also told you to remove the group by. Or else do what @Patrick did and include the function in the group by clause.
Also don't think the group by is necessary but again you have not posted your data so it is hard to tell you.
@tarheel13 data which you have in your program is correct. Without group by we won't be getting the desired results.
@David_Billa Getting sample data in the form of a data step with at least two cases wouldn't have hurt.
Does below give you what you're after? I was actually surprised that the functions really only execute on aggregate level during group by processing and not just on row level - but it appears to work.
data have;
input (Id CRF) (:$20.) Value :$20.;
datalines;
PLAP24MRT_MV [INS]#MAX1 -956461.285475
PLAP24MRT_MV [INS]#MAX2 -30254.271916
PLAP24MRT_MV [INS]#MIN1 -956461.285475
PLAP24MRT_MV [INS]#MIN2 -30254.271916
;
proc Sql;
create view v_inter as
select
id,
scan(CRF,1,'#') as CRF length=10,
scan(CRF,-1,,'ka') as function length=3,
input(Value,32.) as value format=24.6
from have
;
create table Want as
select distinct
Id,
CRF,
function,
case function
when "MIN" then min(Value)
when "MAX" then max(Value)
when "AVG" then avg(Value)
when "STD" then std(Value)
else missing(value)
end
as Value format=24.6
from v_inter
group by Id,CRF,Function
;
quit;
Also important:
Do NOT use an informat with a decimal portion unless you know exactly what you're doing.
IF your source string doesn't have a decimal portion then SAS will assume the should be one and divide the number by 10 power decimal portion.
data demo;
input Value 24.6;
format value 24.6;
datalines;
-956461.285475
-956461.
-956461
;
@Patrick thank you. I want the solution in one step instead of two steps. Also I want the VALUE field to be character in the result.
@David_Billa Below how you get value as a string in the result.
Using a view which only executes when called in the 2nd SQL allows imho for code that's easier to read and maintain. From a processing perspective things only execute in the 2nd SQL.
It wouldn't be that hard to move all the logic from the view into the 2nd SQL - but I'm not going to do this as it's just ugly without adding benefit.
proc Sql;
create view v_inter as
select
id,
scan(CRF,1,'#') as CRF length=10,
scan(CRF,-1,,'ka') as function length=3,
input(Value,32.) as value format=24.6
from have
;
create table Want as
select distinct
Id,
CRF,
function,
put(
case function
when "MIN" then min(Value)
when "MAX" then max(Value)
when "AVG" then avg(Value)
when "STD" then std(Value)
else missing(value)
end
, 24.6 -l)
as Value
from v_inter
group by Id,CRF,Function
;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.