BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello,

Each customer  have 2 rows with  information of two models.

I want to create a new columns called Ind with the following results:

If the model that produce the max is "A" then  in row of "A" will have value 1 

If the model that produce the max is "B" then  in row of "B" will have value 1 

If the model that produce the max is both "A" and  "B" then  in row of "A" will have value 1 

What is the way to do it please?

Wanted table is 

Ronein_0-1632652510979.png

Data have;
input CustID Model $ Y;
cards;
1 A 100
1 B 80
2 A 70
2 B 90
3 A 60
3 B 60
;
Run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

This identifies which model has the maximum value of Y. I assume you can create the desired values of IND from this information.

 

proc summary data=have nway;
    class custid;
    var y;
    output out=data_max max=max_y idgroup(max(y) out[1](model)=model_max_y);
run;

 

--
Paige Miller

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

This identifies which model has the maximum value of Y. I assume you can create the desired values of IND from this information.

 

proc summary data=have nway;
    class custid;
    var y;
    output out=data_max max=max_y idgroup(max(y) out[1](model)=model_max_y);
run;

 

--
Paige Miller
PeterClemmensen
Tourmaline | Level 20
Data have;
input CustID Model $ Y;
cards;
1 A 100
1 B 80
2 A 70
2 B 90
3 A 60
3 B 60
;
Run;

data want(drop = mY n);
   do _N_ = 1 by 1 until(last.CustID);
      set have;
      by CustID;
      if Y > mY then do;
         mY = Y;
         n = _N_;
      end;
   end;
   
   do _N_ = 1 to _N_;
      set have;
      Ind = _N_ = n;
      output;
   end;
run;

 

Result:

 

CustID Model Y   Ind
1      A     100 1
1      B     80  0
2      A     70  0
2      B     90  1
3      A     60  1
3      B     60  0
Ksharp
Super User

Hi Peter.C ,

If data look like this, your code would get wrong result.

 

Data have;
input CustID Model $ Y;
cards;
1 A 100
1 B 80
2 A 70
2 B 90
3 B 60
3 A 60
;
Run;

PeterClemmensen
Tourmaline | Level 20

Shoot, you're right. Thanks 🙂

Ksharp
Super User
Data have;
input CustID Model $ Y;
cards;
1 A 100
1 B 80
2 A 70
2 B 90
3 A 60
3 B 60
;
Run;

proc sql;
create table want as
select *,
case 
when range(Y)=0 then 
  case 
  when Model='A' then 1 
  else 0
  end
when Y=max(Y) then 1 else 0 
end as Ind
 from have
  group by CustID;
quit;
tarheel13
Rhodochrosite | Level 12
proc sql;
	create table max as select custID, max(Y) as max
		from have
		group by custid
		having Y=max(Y);
		
	create table want as select distinct h.custid, h.model, h.y, m.max
		from have as h
		left join
		max as m
		on m.custid=h.custid;
quit;

data want2;
	set want;
	if y ne lag(y) then do;
	if y=max then ind=1;
	else ind=0;
		end;
	if y=lag(y) then do;
	if y=max and model='A' then ind=1;
	else ind=0;
		end;
run;
mkeintz
PROC Star

You can merge each observation with the next observation.   So when the current observation is an "A", you can compare y to the next y, otherwise compare y to the previous y.

 

The program requires there to be exactly 2 models per cust_id, with model A preceding the model B.

 

Data have;
input CustID Model $ Y;
cards;
1 A 100
1 B 80
2 A 70
2 B 90
3 A 60
3 B 60
Run;

data want (drop=_:);
  merge have
        have (firstobs=2 keep=y rename=(y=_next_y));
  ind=ifn(model='A',(y>=_next_y),(y>lag(y)));
run;

Please note this IFN function produces the correct results, unlike the apparently equivalent two statements below, which will produce incorrect results for custid4 below:

 

Data have;
input CustID Model $ Y;
cards;
1 A 100
1 B 80
2 A 70
2 B 90
3 A 60
3 B 60
4 A 70
4 B 65
;
Run;

data want (drop=_:);
  merge have
        have (firstobs=2 keep=y rename=(y=_next_y));
  if model='A' then ind=(y>=_next_y);
  else ind=(y>lag(y));
run;

That is because the LAG function is not like a one-row lookback in Excel.  Instead it is a queue-update.  And if the queue is updated only when the model ^='A', then each y for a B is compared to the prior B, not the preceding A.  

But the IFN function will run the LAG for every observation regardless of the model value.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ronein
Meteorite | Level 14
I think it is not working well,
look at ID=1 , the ind variable get value 1 for both rows and need only in one row
Oligolas
Barite | Level 11
DATA have;
input CustID Model $ Y;
cards;
1 A 100
1 B 80
2 A 70
2 B 90
3 A 60
3 B 60
;
RUN;

PROC SORT;BY custID descending Y Model;RUN;

DATA want;
   length ind 8;
   set have;
   BY custID descending Y Model;
   ind=0;
   if first.custID then ind=1;
RUN;

PROC SORT;BY custID Model;RUN;
________________________

- Cheers -

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1486 views
  • 10 likes
  • 7 in conversation