Hi,
I would be thankful if I could get help on:
Data:-
Data I_have;input Dfm1$ Dfm2$ Dfm3$ Dfm4$ bal1 bal2 bal3 bal4 disc_amt;
Datalines;
y y y y 200 5 33 50 40
N N N y 100 44 22 24 50
N N N y 42 22 22 300 500
N N Y N 55 200 300 100 12
N N y y 500 99 15 400 14
;run;
Goal:- Record the first occurence of the "Y" in a separate varaible called Def_month from the series
of the Variable/array called LDF,at the same point we take the Def_balance, shown in the code below.
To this extent I have completed the code. Now, if just before the first occurence and also at the first occurence of
"Y" the series of bal variable represented through Lbal remains less than or equal to 100 then Def_balance should equal
the variable disc_amt.
For example:-
For the first observation, first occurence of "Y" happens in Dfm1 therefore Def_month gets the value of 1 and since it this occurence come from first variable
so Def_balance=bal1 that is 200.
For the second observation occurence of "Y" happens in Dfm4 therefore Def_month gets the value of 4 and since it this occurence comes from fourth varialbe
so Def_balance=bal4 that is 4.But since we want extra condition to be fullfiled as discussed above the Def_balance then should be 50, value from disc_amt( needed output).This happens because 24 and 22 in Dfm3 and Dfm4 respectively have value less than 100;
Data I_get;
set I_have;
array LDF{*}$ dfm1-dfm4;
array Lbal{*} bal1-bal4;
do j=1 to dim(LDF);
if LDF(j) ="y" then do;
Def_month=j;
Def_balance=Lbal(j)
leave;
end;
end;
Data I_wanna;
input Dfm1$ Dfm2$ Dfm3$ Dfm4$ bal1 bal2 bal3 bal4 disc_amt Def_month Def_balance;
Datalines;
y y y y 200 5 33 50 40 1 200
N N N y 100 44 22 24 50 4 50
N N N y 42 22 22 300 500 4 300
N N Y N 55 200 300 100 12 3 300
N N y y 500 99 15 400 14 3 14
;run;
I'm not sure I follow what you are trying to accomplish, as your use of the term "lag" differs from its typical use in SAS which implies across records. Does the following satisfy your extra condition?
Data I_want;
set I_have;
array LDF{*}$ dfm1-dfm4;
array Lbal{*} bal1-bal4;
do j=1 to dim(LDF);
if upcase(LDF(j)) ="Y" then do;
Def_month=j;
use_disc_amt=disc_amt;
if j gt 1 and lbal(j) lt 100 and lbal(j-1) ge 100
then use_disc_amt=lbal(j);
Def_balance=ifn(lbal(j) lt 100,use_disc_amt,Lbal(j));
leave;
end;
end;
run;
You were close with your own code. I only had to add one line and make a minor adjustment to another:
Data I_want;
set I_have;
array LDF{*}$ dfm1-dfm4;
array Lbal{*} bal1-bal4;
do j=1 to dim(LDF);
if upcase(LDF(j)) ="Y" then do;
Def_month=j;
Def_balance=ifn(lbal(j) lt 100,disc_amt,Lbal(j));
leave;
end;
end;
run;
Hi Art,
Thank you.
In this case we are not chekcing the lag of the first occurence of"Y". In the code we are only emphasizing at the point where first"Y" occured.
Just make the matter clear. I add sixth observation in the data and show youhow the code impacts the results:-
Data I_have;input Dfm1$ Dfm2$ Dfm3$ Dfm4$ bal1 bal2 bal3 bal4 disc_amt;
Datalines;
y y y y 200 5 33 50 40
N N N y 100 44 22 24 50
N N N y 42 22 22 300 500
N N Y N 55 200 300 100 12
N N y y 500 99 15 400 14 N N N y 100 44 200 24 50
;
run;
The above data is exactly the same as previous one but I have added only oneobservation. In the sixth observation you would see that the first occurence of"Y" happens at variable Dfm4 and at that point variable bal4 is24( which is smaller than 100) but then to fullfil the second argument that lagof first occurence should also have balance lower than 100,it is only than wecan have Def_balance=disc_amt. if we observe the first lag, which in this casehappens to be bal3 which is not lower than 100 therefore the Def_balance shouldbe 24 not 50.But the code produces 50 for Def_balnce variable.
With lots of thanks in advance.
regards,
Tony
I'm not sure I follow what you are trying to accomplish, as your use of the term "lag" differs from its typical use in SAS which implies across records. Does the following satisfy your extra condition?
Data I_want;
set I_have;
array LDF{*}$ dfm1-dfm4;
array Lbal{*} bal1-bal4;
do j=1 to dim(LDF);
if upcase(LDF(j)) ="Y" then do;
Def_month=j;
use_disc_amt=disc_amt;
if j gt 1 and lbal(j) lt 100 and lbal(j-1) ge 100
then use_disc_amt=lbal(j);
Def_balance=ifn(lbal(j) lt 100,use_disc_amt,Lbal(j));
leave;
end;
end;
run;
Data I_want (drop=j);
set I_have;
array LDF{*}$ dfm1-dfm4;
array Lbal{*} bal1-bal4;
do j=1 to dim(LDF);
if upcase(LDF(j)) ="Y" then do;
Def_month=j;
if lbal(j) ge 100 then def_balance=lbal(j);
else if j>1 and lbal(j-1) ge 100 then def_balance=lbal(j);
else def_balance=disc_amt;
leave;
end;
end;
run;
Great....I am highly thankful to both Art and Linlin. Both codes were of great help.
Thanks once again.
Just a little change for the sake of fun otherwise above programs do not need any change
Data I_want;
set I_have;
array LDF{*}$ dfm1-dfm4;
array Lbal{*} bal1-bal4;
do j=1 to dim(LDF);
if upcase(LDF(j)) ="Y" then do;
Def_month=j;
use_disc_amt=disc_amt;
if j gt 1 and lbal(j) lt 100 and lbal(j-1) ge 100
then use_disc_amt=lbal(j);
Def_balance=ifn(lbal(j) lt 100,use_disc_amt,Lbal(j));
leave;
end;
end;
run;
Surprised no one offered the function WHICHC()
Peter, I agree that whichc would have simplified the code, but I couldn't figure out how to use it and simultaneously upcase the variables being checked.
Art
you are right!
When the case of the Y is uncertain, whichC() provides no easy way (like find() function modifiers).
However, I'm surprised that these indicators have uncertain case. It is the kind of problem we would remove as the data are loaded (the $upcase. informat is simple).
I don't think we would recommend carrying information in the distinction between "y" and "Y".
When case is uncertain, I would recommend (for clarity rather than peformance), a data step view to upper-case them all.
data I_view / view= I_view ;
set I_have ;
array upp dfm: ;
do over upp ;
upp = upcase( upp ) ;
end ;
run ;
For performance, alternatives come to mind, like Def_month = find( cats( of dfm: ), 'y', 'i' ) ; as in
data want ;
set I_have ;
Def_month = find( cats( of dfm: ), 'y', 'i' ) ;
retain dum 0 ;
array bal(*) dum bal: ;
if not def_month then call missing( def_balance ) ;
else
if max( bal( def_month ), bal( def_month+1 ) ) < 100
then def_balance = disc_amt ;
else def_balance = bal( def_month+1 ) ;
drop dum ;
run ;
I added DUM before the Balances in the array to remove exceptional handling when def_month=1
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.