BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello,

Each customer  have 2 rows with  information of two models.

I want to create a new columns called Ind with the following results:

If the model that produce the max is "A" then  in row of "A" will have value 1 

If the model that produce the max is "B" then  in row of "B" will have value 1 

If the model that produce the max is both "A" and  "B" then  in row of "A" will have value 1 

What is the way to do it please?

Wanted table is 

Ronein_0-1632652510979.png

Data have;
input CustID Model $ Y;
cards;
1 A 100
1 B 80
2 A 70
2 B 90
3 A 60
3 B 60
;
Run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

This identifies which model has the maximum value of Y. I assume you can create the desired values of IND from this information.

 

proc summary data=have nway;
    class custid;
    var y;
    output out=data_max max=max_y idgroup(max(y) out[1](model)=model_max_y);
run;

 

--
Paige Miller

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

This identifies which model has the maximum value of Y. I assume you can create the desired values of IND from this information.

 

proc summary data=have nway;
    class custid;
    var y;
    output out=data_max max=max_y idgroup(max(y) out[1](model)=model_max_y);
run;

 

--
Paige Miller
PeterClemmensen
Tourmaline | Level 20
Data have;
input CustID Model $ Y;
cards;
1 A 100
1 B 80
2 A 70
2 B 90
3 A 60
3 B 60
;
Run;

data want(drop = mY n);
   do _N_ = 1 by 1 until(last.CustID);
      set have;
      by CustID;
      if Y > mY then do;
         mY = Y;
         n = _N_;
      end;
   end;
   
   do _N_ = 1 to _N_;
      set have;
      Ind = _N_ = n;
      output;
   end;
run;

 

Result:

 

CustID Model Y   Ind
1      A     100 1
1      B     80  0
2      A     70  0
2      B     90  1
3      A     60  1
3      B     60  0
Ksharp
Super User

Hi Peter.C ,

If data look like this, your code would get wrong result.

 

Data have;
input CustID Model $ Y;
cards;
1 A 100
1 B 80
2 A 70
2 B 90
3 B 60
3 A 60
;
Run;

PeterClemmensen
Tourmaline | Level 20

Shoot, you're right. Thanks 🙂

Ksharp
Super User
Data have;
input CustID Model $ Y;
cards;
1 A 100
1 B 80
2 A 70
2 B 90
3 A 60
3 B 60
;
Run;

proc sql;
create table want as
select *,
case 
when range(Y)=0 then 
  case 
  when Model='A' then 1 
  else 0
  end
when Y=max(Y) then 1 else 0 
end as Ind
 from have
  group by CustID;
quit;
tarheel13
Rhodochrosite | Level 12
proc sql;
	create table max as select custID, max(Y) as max
		from have
		group by custid
		having Y=max(Y);
		
	create table want as select distinct h.custid, h.model, h.y, m.max
		from have as h
		left join
		max as m
		on m.custid=h.custid;
quit;

data want2;
	set want;
	if y ne lag(y) then do;
	if y=max then ind=1;
	else ind=0;
		end;
	if y=lag(y) then do;
	if y=max and model='A' then ind=1;
	else ind=0;
		end;
run;
mkeintz
PROC Star

You can merge each observation with the next observation.   So when the current observation is an "A", you can compare y to the next y, otherwise compare y to the previous y.

 

The program requires there to be exactly 2 models per cust_id, with model A preceding the model B.

 

Data have;
input CustID Model $ Y;
cards;
1 A 100
1 B 80
2 A 70
2 B 90
3 A 60
3 B 60
Run;

data want (drop=_:);
  merge have
        have (firstobs=2 keep=y rename=(y=_next_y));
  ind=ifn(model='A',(y>=_next_y),(y>lag(y)));
run;

Please note this IFN function produces the correct results, unlike the apparently equivalent two statements below, which will produce incorrect results for custid4 below:

 

Data have;
input CustID Model $ Y;
cards;
1 A 100
1 B 80
2 A 70
2 B 90
3 A 60
3 B 60
4 A 70
4 B 65
;
Run;

data want (drop=_:);
  merge have
        have (firstobs=2 keep=y rename=(y=_next_y));
  if model='A' then ind=(y>=_next_y);
  else ind=(y>lag(y));
run;

That is because the LAG function is not like a one-row lookback in Excel.  Instead it is a queue-update.  And if the queue is updated only when the model ^='A', then each y for a B is compared to the prior B, not the preceding A.  

But the IFN function will run the LAG for every observation regardless of the model value.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ronein
Meteorite | Level 14
I think it is not working well,
look at ID=1 , the ind variable get value 1 for both rows and need only in one row
Oligolas
Barite | Level 11
DATA have;
input CustID Model $ Y;
cards;
1 A 100
1 B 80
2 A 70
2 B 90
3 A 60
3 B 60
;
RUN;

PROC SORT;BY custID descending Y Model;RUN;

DATA want;
   length ind 8;
   set have;
   BY custID descending Y Model;
   ind=0;
   if first.custID then ind=1;
RUN;

PROC SORT;BY custID Model;RUN;
________________________

- Cheers -

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1203 views
  • 10 likes
  • 7 in conversation