hi,
in my dataset, i have incm1, incm2, incm3 and incm4 for each customer. some have only 1 income, some have 2 or more. i want to create a new variable with the max value of the 4 incomes and at the same time, i need to create a flag indicating which income is the max.
for example, the max income of a record is the incm1, the new income value is easy, it's just new_income=max(incm1,incm2,incm3,incm4).
i know using if statement and matching the new income value and the 4 income values can show me the source of the income.
if new_income=incm1 then flag='incm1'; else if new_income=incm2 then flag='incm2'; etc..
but are there any other ways to create the new income flag which shows 'incm1' for this record?
thanks
You could use an array and function vname as follows :
data have;
input incm1 incm2 incm3 incm4;
datalines;
1 2 3 4
;
data want;
set have;
array inc{*} incm:;
incMax = max(of inc{*});
do _n_ = 1 to dim(inc);
if inc{_n_} = incMax then incId = vname(inc{_n_});
end;
run;
proc print; run;
PG
You could use an array and function vname as follows :
data have;
input incm1 incm2 incm3 incm4;
datalines;
1 2 3 4
;
data want;
set have;
array inc{*} incm:;
incMax = max(of inc{*});
do _n_ = 1 to dim(inc);
if inc{_n_} = incMax then incId = vname(inc{_n_});
end;
run;
proc print; run;
PG
this one is great. thank you very much.
PG,
You also can use whichn() to instead of DO LOOP . make it more succinct .
Hi Ksharp,
hope you are doing good.
i still have question understanding this concept..
I expect x0 to have a value of 5!!!!!!!becos . occupies the 5th position????????
why is x0 in the below example having a value of missing???
Thanks
data _null_;
array dates
Magna_Carta Gutenberg
(1492 1066 325 1776 . 1215 1450);
x0=whichn(., of dates
x1=whichn(1492, of dates
x2=whichn(1066, of dates
x3=whichn(1450, of dates
x4=whichn(1000, of dates
put x0= / x1= / x2= / x3= / x4=;
run;
SAS writes the following output to the log:
x0=.
x1=1
x2=2
x3=7
x4=0
It looks like whichn() will ignore missing value, like sum() , mean()
Hi,
Could you please explain :
do _n_ = 1 to dim(inc);
Generally _n_ is from top to bottom. Here we are saying to do it across(column wise)????
Also in the below log why is it performing four times???it should be in a single step???
Thanks
52 data want;
53 set have;
54 array inc{*} incm:;
55 incMax = max(of inc{*});
56 do _n_ = 1 to dim(inc);
57 if inc{_n_} = incMax then incId = vname(inc{_n_});
58 putlog _all_;
59 end;
60 run;
incm1=1 incm2=2 incm3=3 incm4=4 incMax=4 incId= _ERROR_=0 _N_=1
incm1=1 incm2=2 incm3=3 incm4=4 incMax=4 incId= _ERROR_=0 _N_=2
incm1=1 incm2=2 incm3=3 incm4=4 incMax=4 incId= _ERROR_=0 _N_=3
incm1=1 incm2=2 incm3=3 incm4=4 incMax=4 incId=incm4 _ERROR_=0 _N_=4
Hi,
Variable _n_ is always created by SAS within a datastep. Normally _n_ counts the number of iterations of the datastep and is not saved in the resulting dataset. When you don't need its value, you can do anything you want with _n_. In my little program I used it as the control variable for my DO loop. That cheap trick saves me the trouble of dropping the variable (since it is not saved).
Array inc contains all variables beginning with incm (incm1, incm2, incm3, incm4). The DO loop scans the elements of array inc to find which one has the maximum value. It has to loop dim(inc) = 4 times. Each time it loops, printlog executes. Thus it prints four lines on the log. You can see that it finds the maximum value on the last iteration.
hth
PG
Also, like pointed out, you could also use:
data want2;
set have;
array inc{*} incm:;
incId = vname(inc{whichn(max(of inc{*}),of inc{*})});
run;
to avoid programming the loop.
Quiz
1) The two codes do not always give the same result for incId. Can you tell how?
2) Can you modify the looping code so that both codes will give the same result?
PG
I'm guessing it can happen when two or more columns share the max value?
Good guess! What happens then?
I guess the variable name of last iteration with maximum value is outputted??? If it is right then how do we correct this while using whichn?
Thanks
You are right! The looping code will return the last maximum whereas whichn will return the first one. I would find it easier to modify the looping algorithm to match the whichn algorithm than the other way around.I can think of two simple ways to get the looping code to return the first max. I will post them over the weekend if you don't find them first!
PG
Hi PG,
I could not think of a proper solution for the loop to return the max value...
Could you help me with that/
Thanks
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.