Help using Base SAS procedures

data processing

Reply
N/A
Posts: 0

data processing

Hi All,

I have a dataset in the following format:
year bank_ID cash_balance

2000 1 0
2000 2 0
2000 3 0
2000 4 0
2001 1 10
2001 2 20
2001 3 0
2001 4 5
2002 1 15
2002 2 22
2002 3 5
2002 4 7
.
.
.
2009.......

Now, I would like to create three variables, time_1, time_2, time_3. These three time variables represent presence or absense (0/1) of a bank balance of >$0 for a particular bankID in the previous year, in two years previous, and three year previous.

Example 1. When SAS is evaluating year 2000, then it should create variables temp_1, temp_2, and temp_3 based on bank balances for the unique bankIDs during years 1999, 1998, and 1997. Since data for years 1997-1999 are missing, temp_1, temp_2, and temp_3 values for year 2000 should have missing values. In case of year 2001, data for years 1998 and 1999 are missing, so temp_2 and temp_3 values for year 2001 should be set to missing.

Example 2. For year 2003 and bankID 1, bank balance for year 2002 (temp_1) is 15, so temp_1 would be 1, however for 2000 (temp_3) the bank_balance is 0, so value of temp_3 would be 0.

Any suggestions regarding an appropriate SAS program to process this data will be highly appreciated. Thank you.
SAS Employee
Posts: 174

Re: data processing

Posted in reply to deleted_user
[pre]data input;
input year bank_ID cash_balance;
datalines;
2000 1 0
2000 2 0
2000 3 0
2000 4 0
2001 1 10
2001 2 20
2001 3 0
2001 4 5
2002 1 15
2002 2 22
2002 3 5
2002 4 7
2003 1 0
2003 2 0
2003 3 0
2003 4 0
2004 1 10
2004 2 20
2004 3 0
2004 4 5
2005 1 15
2005 2 22
2005 3 5
2005 4 7
;
run;

proc sort data=input;
by bank_id year;
run;

data lag;
set input;
time_1 = max(lag1(cash_balance),0);
time_2 = max(lag2(cash_balance),0);
time_3 = max(lag3(cash_balance),0);
run;[/pre]
Respected Advisor
Posts: 4,173

Re: data processing

Posted in reply to GertNissen
Looking at your data I think Geniz solution comes only close to what you asked for.

May be this might help (not tested):

proc format;
value bool
>0 =1
;
run;

data lag;
set input;
time_1 = input(put(lag1(cash_balance),bool.),8.);
time_1 = input(put(lag2(cash_balance),bool.),8.);
time_1 = input(put(lag3(cash_balance),bool.),8.);
run;
N/A
Posts: 0

Re: data processing

Thanks Geniz and Patrick. I will work on your suggestions and get back to the forum very soon. Any alternate suggestions from other forum members are most welcome.
Respected Advisor
Posts: 4,173

Re: data processing

Posted in reply to deleted_user
Tss, tss... and what both solutions where missing so far is what I included now in the end of the code.
Not very elegant - but it should work.

data input;
input year bank_ID cash_balance;
datalines;
2000 1 0
2000 2 0
2000 3 0
2000 4 0
2001 1 10
2001 2 20
2001 3 0
2001 4 5
2002 1 15
2002 2 22
2002 3 5
2002 4 7
2003 1 0
2003 2 0
2003 3 0
2003 4 0
2004 1 10
2004 2 20
2004 3 0
2004 4 5
2005 1 15
2005 2 22
2005 3 5
2005 4 7
;
run;

proc sort data=input;
by bank_id year;
run;

proc format;
value bool
>0 =1
;
run;

data lag;
set input;
by bank_id year;
time_1 = input(put(lag1(cash_balance),bool.),8.);
time_2 = input(put(lag2(cash_balance),bool.),8.);
time_3 = input(put(lag3(cash_balance),bool.),8.);

if bank_id ne lag1(bank_id) then time_1=.;
if bank_id ne lag2(bank_id) then time_2=.;
if bank_id ne lag3(bank_id) then time_3=.;
run; Message was edited by: Patrick
Respected Advisor
Posts: 3,799

Re: data processing

Normalizing Boolean and initializing LAGS at BY group breaks.

Using the technique of this sample program http://support.sas.com/kb/24/694.html and using a double negative to normalize the flags I suggest this small modification to your program.

[pre]
data lag;
set input;
by bank_id year;
array time_[3];
time_1 = not not lag1(cash_balance);
time_2 = not not lag2(cash_balance);
time_3 = not not lag3(cash_balance);
if first.bank_id then count = 0;
count + 1;
do _n_ = count to dim(time_);
time_[_n_] = .;
end;
drop count;
run;
proc print;
by bank_id;
id bank_id;
run;

cash_
bank_ID year balance time_1 time_2 time_3

1 2000 0 . . .
2001 10 0 . .
2002 15 1 0 .
2003 0 1 1 0
2004 10 0 1 1
2005 15 1 0 1


2 2000 0 . . .
2001 20 0 . .
2002 22 1 0 .
2003 0 1 1 0
2004 20 0 1 1
2005 22 1 0 1


3 2000 0 . . .
2001 0 0 . .
2002 5 0 0 .
2003 0 1 0 0
2004 0 0 1 0
2005 5 0 0 1


4 2000 0 . . .
2001 5 0 . .
2002 7 1 0 .
2003 0 1 1 0
2004 5 0 1 1
2005 7 1 0 1
[/pre]
Ask a Question
Discussion stats
  • 5 replies
  • 136 views
  • 0 likes
  • 4 in conversation