BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
David_Billa
Rhodochrosite | Level 12

I'm trying to find the maximum value from VALUE field group by on two fields (Id, CRF) as shown below. I'd would like to understand whether this program is logically correct. I'm not getting the desired results with this program. I don't wish to create a any new field after converting the VALUE from character to numeric.

 

Input Data:

Id	        CRF	         Value
PLAP24MRT_MV	[INS]#MAX1	-956461.285475
PLAP24MRT_MV	[INS]#MAX2	-30254.271916

Desired Result:

 

Id	        CRF	 Value
PLAP24MRT_MV	[INS]	-30254.271916

Program:

 

proc Sql;
    create table Want as select distinct * from (
        select Id
               ,scan(CRF,1,'#') as CRF, (
            case 
                when scan(CRF,-1,,'ka')="MIN" then strip(put(min(input(Value,24.6)),24.6))
                when scan(CRF,-1,,'ka')="MAX" then strip(put(max(input(Value,24.6)),24.6))
                when scan(CRF,-1,,'ka')="AVG" then strip(put(avg(input(Value,24.6)),24.6))
                when scan(CRF,-1,,'ka')="STD" then strip(put(std(input(Value,24.6)),24.6))
                else ''
            end )
        as Value
        from INPUT
        group by Id,CRF);
quit;
1 ACCEPTED SOLUTION

Accepted Solutions
Sajid01
Meteorite | Level 14

Hello @David_Billa 
Your program  logic needs a small correction.
Your source data set has a variable 'CRF'.

In the Proc SQL you are recalculating the variable  'CRF' .
It is this recalculated 'CRF' you want to use.
However the use of 'CRF' in the group by statement cause the PROC SQL to use the original 'CRF' and not the newly calculated one.

Therefore use the calculated keyword before 'CRF' in the group by statement as shown below and you will get the result.

proc Sql;
    create table Want as select distinct * from (
        select Id
               ,scan(CRF,1,'#') as CRF, (
            case 
                when scan(CRF,-1,,'ka')="MIN" then strip(put(min(input(Value,24.6)),24.6))
                when scan(CRF,-1,,'ka')="MAX" then strip(put(max(input(Value,24.6)),24.6))
                when scan(CRF,-1,,'ka')="AVG" then strip(put(avg(input(Value,24.6)),24.6))
                when scan(CRF,-1,,'ka')="STD" then strip(put(std(input(Value,24.6)),24.6))
                else ''
            end )
        as Value
        from INPUT
        group by Id,calculated  CRF);
quit;

You will get the desired output as can be seen below

Sajid01_0-1626200648903.png

 

View solution in original post

22 REPLIES 22
ballardw
Super User

Make a small set of data where you know what the result should be. You should include some of those CRF values with "min","avg" and "std" to properly test your code.

Run your code.

Does it match your expectation?.

David_Billa
Rhodochrosite | Level 12

@ballardw Tried already. It's not producing the desired results. May I know whether the charcter to numeric and then to character conversion is correct in CASE WHEN as shown in initial post?

ballardw
Super User

I don't use subqueries in Case statements but to select a maximum value by group I would expect to see something like

 

 

Proc sql;
   create table want as
   select <group vars>, max(numeric expression)
   from dataset
   group by <group vars>
   ;
quit;

 

 

If you only want to consider values where CRF contains MAX that would be a WHERE subsetting the data set such as:

 

Proc sql;
   create table want as
   select <group vars>, Put(max(numeric expression),24.6) as value
   from (select * from  dataset where scan(CRF,-1,,'ka')='MAX'  )
   group by <group vars>
   ;
quit;

Or similar.
You want to test what your INPUT is creating. You may find that if you have integers the value is not as expected.

 

Consider:

 

data example; 
   x="-956461";
   y=input(x,24.6);
run;

Look very closely for a decimal in the above.  Leading spaces might make things even more entertaining. Input unless you are dealing with fixed implicit decimal places is typically not desired. An F24. informat (or 24.0) will read the numeric values just fine.

 

 

Of course one does wonder why a value like character to begin with.

 

As always, actual example data in the form of a data step might get something that works quicker.

Since your "example" of two lines had none of "avg" "min" or "std" it is real hard for us to test any code.

 

 

 

 

tarheel13
Rhodochrosite | Level 12
proc sql;
	create table have 
	(id char(200),CRF char(200), value char(200));
	insert into have 
	values('PLAP24MRT_MV','[INS]#MAX1','-956461.285475')
	values('PLAP24MRT_MV', '[INS]#MAX2','-30254.271916');	
quit;

data have2;
	set have;
	new_CRF=scan(CRF,1,'#');
	val=input(value,24.6);
run;

proc sql;
	create table have3 as
	select distinct id, new_CRF,
	case when scan(CRF,-1,,'ka')="MIN" then strip(put(min(val),26.4))
    when scan(CRF,-1,,'ka')="MAX" then strip(put(max(val),26.4))
    when scan(CRF,-1,,'ka')="AVG" then strip(put(mean(val),26.4))
    when scan(CRF,-1,,'ka')="STD" then strip(put(std(val),26.4))
    else '' end as value
    	from have2
    	group by id, new_CRF;
quit;

I don't see the issue? This gets -30254.271916 as the result.

David_Billa
Rhodochrosite | Level 12

@tarheel13 can we do this in two steps by removing have2 datastep?  Also may I know why you're changing the format to 26.4? Can't we retain with 24.6?

tarheel13
Rhodochrosite | Level 12

The 24.6 was a mistake. But anyway I agree with @ballardw and I don't think you need to use subqueries either. You can put it in one SQL and remove the DATA step. The result is still the same.

proc sql;
	create table have3 as
	select distinct id, scan(CRF,1,'#') as CRF,
	case when scan(CRF,-1,,'ka')="MIN" then strip(put(min(input(value,26.4)),26.4))
    when scan(CRF,-1,,'ka')="MAX" then strip(put(max(input(value,26.4)),26.4))
    when scan(CRF,-1,,'ka')="AVG" then strip(put(mean(input(value,26.4)),26.4))
    when scan(CRF,-1,,'ka')="STD" then strip(put(std(input(value,26.4)),26.4))
    else '' end as value
    	from have2
    	group by id, CRF;
quit;
David_Billa
Rhodochrosite | Level 12

@tarheel13 It's not producing the desired results. You can execute and check

 

proc sql;
create table have 
	(id char(200),CRF char(200), value char(200));
	insert into have 
	values('PLAP24MRT_MV','[INS]#MAX1','-956461.285475')
	values('PLAP24MRT_MV', '[INS]#MAX2','-30254.271916');	
quit;


proc sql;
	create table have3 as
	select distinct id, scan(CRF,1,'#') as CRF,
	case when scan(CRF,-1,,'ka')="MIN" then strip(put(min(input(value,26.4)),26.4))
    when scan(CRF,-1,,'ka')="MAX" then strip(put(max(input(value,26.4)),26.4))
    when scan(CRF,-1,,'ka')="AVG" then strip(put(mean(input(value,26.4)),26.4))
    when scan(CRF,-1,,'ka')="STD" then strip(put(std(input(value,26.4)),26.4))
    else '' end as value
    	from have
    	group by id, CRF;
quit;
tarheel13
Rhodochrosite | Level 12

I also told you to remove the group by. Or else do what @Patrick did and include the function in the group by clause.

tarheel13
Rhodochrosite | Level 12

Also don't think the group by is necessary but again you have not posted your data so it is hard to tell you.

David_Billa
Rhodochrosite | Level 12

@tarheel13 data which you have in your program is correct. Without group by we won't be getting the desired results.

tarheel13
Rhodochrosite | Level 12
Okay, I am leaving this conversation. The group by also does not get what you want.
Patrick
Opal | Level 21

@David_Billa Getting sample data in the form of a data step with at least two cases wouldn't have hurt.

 

Does below give you what you're after? I was actually surprised that the functions really only execute on aggregate level during group by processing and not just on row level - but it appears to work.

data have;
  input (Id CRF) (:$20.) Value :$20.;
  datalines;
PLAP24MRT_MV [INS]#MAX1 -956461.285475
PLAP24MRT_MV [INS]#MAX2 -30254.271916
PLAP24MRT_MV [INS]#MIN1 -956461.285475
PLAP24MRT_MV [INS]#MIN2 -30254.271916
;

proc Sql;
  create view v_inter as
    select 
      id,
      scan(CRF,1,'#') as CRF length=10,
      scan(CRF,-1,,'ka') as function length=3,
      input(Value,32.) as value format=24.6
    from have
    ; 
  create table Want as 
    select distinct
      Id,
      CRF, 
      function,
      case function 
        when "MIN" then min(Value)
        when "MAX" then max(Value)
        when "AVG" then avg(Value)
        when "STD" then std(Value)
        else missing(value)
      end 
      as Value format=24.6
    from v_inter
    group by Id,CRF,Function
  ;
quit;

Patrick_0-1626160799924.png

Also important:

Do NOT use an informat with a decimal portion unless you know exactly what you're doing.

IF your source string doesn't have a decimal portion then SAS will assume the should be one and divide the number by 10 power decimal portion.

data demo;
  input Value 24.6;
  format value 24.6;
  datalines;
-956461.285475
-956461.
-956461
;

Patrick_1-1626161061488.png

 

David_Billa
Rhodochrosite | Level 12

@Patrick thank you. I want the solution in one step instead of two steps. Also I want the VALUE field to be character in the result. 

Patrick
Opal | Level 21

@David_Billa Below how you get value as a string in the result. 

Using a view which only executes when called in the 2nd SQL allows imho for code that's easier to read and maintain. From a processing perspective things only execute in the 2nd SQL.

It wouldn't be that hard to move all the logic from the view into the 2nd SQL - but I'm not going to do this as it's just ugly without adding benefit.

proc Sql;
  create view v_inter as
    select 
      id,
      scan(CRF,1,'#') as CRF length=10,
      scan(CRF,-1,,'ka') as function length=3,
      input(Value,32.) as value format=24.6
    from have
    ; 
  create table Want as 
    select distinct
      Id,
      CRF, 
      function,
      put(
          case function 
            when "MIN" then min(Value)
            when "MAX" then max(Value)
            when "AVG" then avg(Value)
            when "STD" then std(Value)
            else missing(value)
          end 
          , 24.6 -l)
        as Value 
    from v_inter
    group by Id,CRF,Function
  ;
quit;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 22 replies
  • 2039 views
  • 12 likes
  • 6 in conversation