Hi, I dont know how to describe in the subject but here is what I'm struggling now.
So I have this big table (24 months average balance (see below sample)
id |active_dt|avgbal_201804|avgbal_201805|.........|avgbal_201902 001| 201807| 5000| 5300|.........| 7000 002| 201901| 3200| 3000|.........| 4500
I need to get avgbal of 3 months before and 3 months after from the active_dt
The new fields name are avgbal_minus3, avgbal_minus2, avgbal_minus1, avgbal_plus1, avgbal_plus2, avgbal_plus3
id 001 should have
avgbal_minus3= take from avgbal_201804
avgbal_minus2= take from avgbal_201805
avgbal_minus1= take from avgbal_201806
avgbal_plus1= take from avgbal_201808
avgbal_plus2= take from avgbal_201809
avgbal_plus3= take from avgbal_201810
while id 002 will take from avgbal_201810, avgbal_201811, avgbal_201812 and avgbal_201902 (this id will have blank avgbal_plus2 and avgbal_plus3 since the data only up to February.
The final table should look like below
id |active_dt|avgbal_minus3|avgbal_minus2|avgbal_minus1|avgbal_plus1|avgbal_plus2|avgbal_plus3|
001| 201807| 5000| 5300| XXXX| XXXX| XXXX| XXXX|
002| 201901| XXXX| XXXX| XXXX| 4500| | |
Please help
Thanks
Although transposing the data and getting rid of data in variable names, as suggested by @Kurt_Bremser, is the recommended why to solve the problem, your could skip transposing:
data work.have;
length id $ 3 active_dt 8
avgbal_201804 avgbal_201805 avgbal_201806 avgbal_201807 avgbal_201808 avgbal_201809
avgbal_201810 avgbal_201811 avgbal_201812 avgbal_201901 avgbal_201902 8
;
input id active_dt avgbal_201804 avgbal_201805 avgbal_201806 avgbal_201807 avgbal_201808 avgbal_201809
avgbal_201810 avgbal_201811 avgbal_201812 avgbal_201901 avgbal_201902;
datalines;
001 201807 5000 5300 . . . . . . . . 7000
002 201901 3200 3000 . . . . . . . . 4500
;
run;
/* Get the names of avgval-variables */
proc sql noprint;
select Name
into :avgList separated by '#'
from sashelp.vcolumn
where LibName = 'WORK' and MemName = 'HAVE' and Name like 'avgbal%'
;
quit;
/* to distinguish between missing data and missing variable and to format both according to specification */
proc format;
value avgbal
LOW-HIGH = [best32.]
. = 'XXXX' /* data is missing in value */
.v = ' ' /* variable is missing */
;
run;
data work.want;
set work.have;
length
avgbal_minus3 avgbal_minus2 avgbal_minus1 avgbal_plus1 avgbal_plus2 avgbal_plus3 8
active_date i dummy 8
varname $ 32
;
keep id active_dt avgbal_minus3 avgbal_minus2 avgbal_minus1 avgbal_plus1 avgbal_plus2 avgbal_plus3;
/* dummy is required to get a continuous array */
retain dummy 0;
format active_date yymmn6.;
array targets[-3:3] avgbal_minus3 avgbal_minus2 avgbal_minus1 dummy avgbal_plus1 avgbal_plus2 avgbal_plus3;
active_date = input(cats(active_dt, '01'), yymmdd8.);
do i = -3 to 3;
if i ^= 0 then do;
varname = cats('avgbal_', put(intnx('month', active_date, i), yymmn6.));
if findw("&avgList.", varname, '#', 'it') then do;
targets[i] = vvaluex(varname);
end;
else do;
targets[i] = .v;
end;
end;
end;
run;
proc print data=work.want;
format avgbal: avgbal.;
run;
First, transpose the data to a long format where you have id, active_dt, balance_dt (derived form the avgbal variable names) and balance.
You can then retrieve the balance grouped by id and active_dt:
data have;
infile datalines dlm='|';
input id active_dt :yymmn6. avgbal_201804 avgbal_201805 avgbal_201806 avgbal_201807 avgbal_201808 avgbal_201809 avgbal_201810 avgbal_201811 avgbal_201812 avgbal_201901 avgbal_201902;
format active_dt yymmddd10.;
datalines;
001|201807|5000|5300|5500|5600|5700|5800|5900|6000|6200|6500|7000
;
run;
proc transpose data=have out=trans;
by id active_dt;
var avgbal:;
run;
data trans2;
set trans;
balance_dt = input(scan(_name_,2,'_'),yymmn6.);
format balance_dt yymmddd10.;
drop _name_;
rename col1=avg_balance;
run;
data want;
set trans2;
by id active_dt;
retain avgbal_minus3 avgbal_minus2 avgbal_minus1 avgbal_plus1 avgbal_plus2 avgbal_plus3;
array balances_minus {3} avgbal_minus1-avgbal_minus3;
array balances_plus {3} avgbal_plus1-avgbal_plus3;
if first.active_dt
then do i = 1 to 3;
balances_minus{i} = .;
balances_plus{i} = .;
end;
do i = 1 to 3;
if intck('month',active_dt,balance_dt) = i then balances_plus{i} = avg_balance;
if intck('month',balance_dt,active_dt) = i then balances_minus{i} = avg_balance;
end;
if last.active_dt then output;
keep id active_dt avgbal_minus: avgbal_plus:;
run;
Note that a "long" dataset format without data (dates in your case) in structure (variable names) is always easier to code against.
See Maxims 19 & 33.
Although transposing the data and getting rid of data in variable names, as suggested by @Kurt_Bremser, is the recommended why to solve the problem, your could skip transposing:
data work.have;
length id $ 3 active_dt 8
avgbal_201804 avgbal_201805 avgbal_201806 avgbal_201807 avgbal_201808 avgbal_201809
avgbal_201810 avgbal_201811 avgbal_201812 avgbal_201901 avgbal_201902 8
;
input id active_dt avgbal_201804 avgbal_201805 avgbal_201806 avgbal_201807 avgbal_201808 avgbal_201809
avgbal_201810 avgbal_201811 avgbal_201812 avgbal_201901 avgbal_201902;
datalines;
001 201807 5000 5300 . . . . . . . . 7000
002 201901 3200 3000 . . . . . . . . 4500
;
run;
/* Get the names of avgval-variables */
proc sql noprint;
select Name
into :avgList separated by '#'
from sashelp.vcolumn
where LibName = 'WORK' and MemName = 'HAVE' and Name like 'avgbal%'
;
quit;
/* to distinguish between missing data and missing variable and to format both according to specification */
proc format;
value avgbal
LOW-HIGH = [best32.]
. = 'XXXX' /* data is missing in value */
.v = ' ' /* variable is missing */
;
run;
data work.want;
set work.have;
length
avgbal_minus3 avgbal_minus2 avgbal_minus1 avgbal_plus1 avgbal_plus2 avgbal_plus3 8
active_date i dummy 8
varname $ 32
;
keep id active_dt avgbal_minus3 avgbal_minus2 avgbal_minus1 avgbal_plus1 avgbal_plus2 avgbal_plus3;
/* dummy is required to get a continuous array */
retain dummy 0;
format active_date yymmn6.;
array targets[-3:3] avgbal_minus3 avgbal_minus2 avgbal_minus1 dummy avgbal_plus1 avgbal_plus2 avgbal_plus3;
active_date = input(cats(active_dt, '01'), yymmdd8.);
do i = -3 to 3;
if i ^= 0 then do;
varname = cats('avgbal_', put(intnx('month', active_date, i), yymmn6.));
if findw("&avgList.", varname, '#', 'it') then do;
targets[i] = vvaluex(varname);
end;
else do;
targets[i] = .v;
end;
end;
end;
run;
proc print data=work.want;
format avgbal: avgbal.;
run;
I keep getting V as the value. Do I miss something here?
Thanks
Run @andreas_lds' code AS IS, and you'll get this result:
|
Keep in mind that we have no clue how your original dataset really looks, as you did not provide it in usable form (data step with datalines).
Got it.
Because I'm using the original table but didnt change the table name at avgList part.
Thank you so much!!!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.