Solved: New column with Ind of max

Ronein · Posted 09-26-2021 06:35 AM

Hello,

Each customer have 2 rows with information of two models.

I want to create a new columns called Ind with the following results:

If the model that produce the max is "A" then in row of "A" will have value 1

If the model that produce the max is "B" then in row of "B" will have value 1

If the model that produce the max is both "A" and "B" then in row of "A" will have value 1

What is the way to do it please?

Wanted table is

Data have;
input CustID Model $ Y;
cards;
1 A 100
1 B 80
2 A 70
2 B 90
3 A 60
3 B 60
;
Run;

PaigeMiller · Posted 09-26-2021 06:59 AM

This identifies which model has the maximum value of Y. I assume you can create the desired values of IND from this information.

proc summary data=have nway;
    class custid;
    var y;
    output out=data_max max=max_y idgroup(max(y) out[1](model)=model_max_y);
run;

--
Paige Miller

View solution in original post

PaigeMiller · Posted 09-26-2021 06:59 AM

This identifies which model has the maximum value of Y. I assume you can create the desired values of IND from this information.

proc summary data=have nway;
    class custid;
    var y;
    output out=data_max max=max_y idgroup(max(y) out[1](model)=model_max_y);
run;

--
Paige Miller

PeterClemmensen · Posted 09-26-2021 08:36 AM

Data have;
input CustID Model $ Y;
cards;
1 A 100
1 B 80
2 A 70
2 B 90
3 A 60
3 B 60
;
Run;

data want(drop = mY n);
   do _N_ = 1 by 1 until(last.CustID);
      set have;
      by CustID;
      if Y > mY then do;
         mY = Y;
         n = _N_;
      end;
   end;
   
   do _N_ = 1 to _N_;
      set have;
      Ind = _N_ = n;
      output;
   end;
run;

Result:

CustID Model Y   Ind
1      A     100 1
1      B     80  0
2      A     70  0
2      B     90  1
3      A     60  1
3      B     60  0

The DATA to DATA Step Macro
Blog: SASnrd

Ksharp · Posted 09-26-2021 08:42 AM

Hi Peter.C ,

If data look like this, your code would get wrong result.

Data have;
input CustID Model $ Y;
cards;
1 A 100
1 B 80
2 A 70
2 B 90
3 B 60
3 A 60
;
Run;

PeterClemmensen · Posted 09-27-2021 01:39 AM

Shoot, you're right. Thanks 🙂

The DATA to DATA Step Macro
Blog: SASnrd

Ksharp · Posted 09-26-2021 08:38 AM

Data have;
input CustID Model $ Y;
cards;
1 A 100
1 B 80
2 A 70
2 B 90
3 A 60
3 B 60
;
Run;

proc sql;
create table want as
select *,
case 
when range(Y)=0 then 
  case 
  when Model='A' then 1 
  else 0
  end
when Y=max(Y) then 1 else 0 
end as Ind
 from have
  group by CustID;
quit;

tarheel13 · Posted 09-26-2021 09:01 AM

proc sql;
	create table max as select custID, max(Y) as max
		from have
		group by custid
		having Y=max(Y);
		
	create table want as select distinct h.custid, h.model, h.y, m.max
		from have as h
		left join
		max as m
		on m.custid=h.custid;
quit;

data want2;
	set want;
	if y ne lag(y) then do;
	if y=max then ind=1;
	else ind=0;
		end;
	if y=lag(y) then do;
	if y=max and model='A' then ind=1;
	else ind=0;
		end;
run;

mkeintz · Posted 09-26-2021 04:49 PM

You can merge each observation with the next observation. So when the current observation is an "A", you can compare y to the next y, otherwise compare y to the previous y.

The program requires there to be exactly 2 models per cust_id, with model A preceding the model B.

Data have;
input CustID Model $ Y;
cards;
1 A 100
1 B 80
2 A 70
2 B 90
3 A 60
3 B 60
Run;

data want (drop=_:);
  merge have
        have (firstobs=2 keep=y rename=(y=_next_y));
  ind=ifn(model='A',(y>=_next_y),(y>lag(y)));
run;

Please note this IFN function produces the correct results, unlike the apparently equivalent two statements below, which will produce incorrect results for custid4 below:

Data have;
input CustID Model $ Y;
cards;
1 A 100
1 B 80
2 A 70
2 B 90
3 A 60
3 B 60
4 A 70
4 B 65
;
Run;

data want (drop=_:);
  merge have
        have (firstobs=2 keep=y rename=(y=_next_y));
  if model='A' then ind=(y>=_next_y);
  else ind=(y>lag(y));
run;

That is because the LAG function is not like a one-row lookback in Excel. Instead it is a queue-update. And if the queue is updated only when the model ^='A', then each y for a B is compared to the prior B, not the preceding A.

But the IFN function will run the LAG for every observation regardless of the model value.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ronein · Posted 09-29-2021 01:20 AM

I think it is not working well,
look at ID=1 , the ind variable get value 1 for both rows and need only in one row

Oligolas · Posted 09-27-2021 10:49 AM

DATA have;
input CustID Model $ Y;
cards;
1 A 100
1 B 80
2 A 70
2 B 90
3 A 60
3 B 60
;
RUN;

PROC SORT;BY custID descending Y Model;RUN;

DATA want;
   length ind 8;
   set have;
   BY custID descending Y Model;
   ind=0;
   if first.custID then ind=1;
RUN;

PROC SORT;BY custID Model;RUN;

________________________
- Cheers -

New column with Ind of max

Re: New column with Ind of max

Re: New column with Ind of max

Re: New column with Ind of max

Re: New column with Ind of max

Re: New column with Ind of max

Re: New column with Ind of max

Re: New column with Ind of max

Re: New column with Ind of max

Re: New column with Ind of max

Re: New column with Ind of max

Registration is open

Registration is open

SAS Training: Just a Click Away