I have a data set which has account numbers and balances in 2018 and 2020. I found the account numbers in 2019 which has some accounts from 2018 and 2020. I would like to create an array of balances for 2019, where if the 2019 account number matches with either 2018 or 2020, then pick the balance 2018. (Data Want shows what I would like!)
For example, the first observation has account number 101 in 2019. This account was present in both 2018 and 2020. Therefore, I would like the 2019 balance (bal2019_1) to equal (bal2018_1).
data have;
input acct2018_1 acct2018_2 acct2018_3 acct2018_4 acct2018_5 bal2018_1 bal2018_2 bal2018_3 bal2018_4 bal2018_5
acct2020_1 acct2020_2 acct2020_3 acct2020_4 acct2020_5 bal2020_1 bal2020_2 bal2020_3 bal2020_4 bal2020_5;
datalines;
101 407 103 . . 40 60 80 . . 101 604 505 . . 10 19 20 . .
303 203 . . . 70 80 . . . 507 205 406 907 . 14 19 89 99 .
901 602 801 . . 10 14 24 . . 404 901 505 802 . 20 90 99 87 .
301 501 904 905 . 84 90 20 95 . 808 . . . . 34 . . . . .
;
run;
data account_balance2019;
input acct2019_1 acct2019_2 acct2019_3 acct2019_4 acct2019_5 ;
datalines;
101 . . . .
907 203 . . .
801 505 901 . .
808 301 . . .
;
run;
data want;
input acct2018_1 acct2018_2 acct2018_3 acct2018_4 acct2018_5 bal2018_1 bal2018_2 bal2018_3 bal2018_4 bal2018_5
acct2019_1 acct2019_2 acct2019_3 acct2019_4 acct2019_5 bal2019_1 bal2019_2 bal2019_3 bal2019_4 bal2019_5
acct2020_1 acct2020_2 acct2020_3 acct2020_4 acct2020_5 bal2020_1 bal2020_2 bal2020_3 bal2020_4 bal2020_5;
datalines;
101 407 103 . . 40 60 80 . . 101 . . . . 40 . . . . . 101 604 505 . . 10 19 20 . .
303 203 . . . 70 80 . . . 907 203 . . . 99 80 . . . 507 205 406 907 . 14 19 89 99 .
901 602 801 . . 10 14 24 . . 801 505 901 . . 24 99 10 . . 404 901 505 802 . 20 90 99 87 .
301 501 904 905 . 84 90 20 95 . 808 301 . . . 34 84 . . . 808 . . . . 34 . . . . .
;
run;
Hi,
use arrays like this:
data have1; set have; id+1; run; data ab1; set account_balance2019; id+1; run; data hab0; merge have1 ab1; by id; run; data hab1; length bal2019_1 bal2019_2 bal2019_3 bal2019_4 bal2019_5 8; set hab0; call missing(bal2019_1,bal2019_2,bal2019_3,bal2019_4,bal2019_5); array a18{5} acct2018_1-acct2018_5; array a19{5} acct2019_1-acct2019_5; array a20{5} acct2020_1-acct2020_5; array b18{5} bal2018_1-bal2018_5; array b19{5} bal2019_1-bal2019_5; array b20{5} bal2020_1-bal2020_5; do i=1 to 5; if a19[i] ne . then do; do j=1 to 5; *scan all 2018 values; if a19[i] eq a18[j] then do; b19[i]=b18[j]; leave; *stop loop if value found; end; end; if b19[i] eq . then do; *if still no value found in 2018; do k=1 to 5; *scan all 2020 values; if a19[i] eq a20[k] then do; b19[i]=b20[k]; *stop loop if value found; leave; end; end; end; end; end; drop id i j k; run;
*reordering the items; PROC SQL noprint; select name into :names separated by ' ' from sashelp.vcolumn where libname eq 'WORK' and memname eq 'HAB1' order by scan(compress(name,,'Aa'),1,'_'), substr(name,1,1), scan(name,2,'_') ; QUIT; DATA Want181920; ATTRIB &names. label=''; set hab1; RUN; %symdel names;
- Cheers -
This is another classical example for bad data modeling forcing crap (expletive used intentionally!) code.
The first thing you do is transpose those unusable wide datasets to a long layout:
data have;
input acct2018_1 acct2018_2 acct2018_3 acct2018_4 acct2018_5 bal2018_1 bal2018_2 bal2018_3 bal2018_4 bal2018_5
acct2020_1 acct2020_2 acct2020_3 acct2020_4 acct2020_5 bal2020_1 bal2020_2 bal2020_3 bal2020_4 bal2020_5;
datalines;
101 407 103 . . 40 60 80 . . 101 604 505 . . 10 19 20 . .
303 203 . . . 70 80 . . . 507 205 406 907 . 14 19 89 99 .
901 602 801 . . 10 14 24 . . 404 901 505 802 . 20 90 99 87 .
301 501 904 905 . 84 90 20 95 . 808 . . . . 34 . . . . .
;
data account_balance2019;
input acct2019_1 acct2019_2 acct2019_3 acct2019_4 acct2019_5 ;
datalines;
101 . . . .
907 203 . . .
801 505 901 . .
808 301 . . .
;
data have_wide;
set have;
n = _n_;
run;
proc transpose data=have_wide out=_have_long_a (rename=(col1=account));
by n;
var acct:;
run;
proc transpose data=have_wide out=_have_long_b (rename=(col1=balance));
by n;
var bal:;
run;
data have_long_a;
set _have_long_a;
year = substr(_name_,5,4);
seq = substr(_name_,10);
drop _name_;
run;
data have_long_b;
set _have_long_b;
year = substr(_name_,4,4);
seq = substr(_name_,9);
drop _name_;
run;
proc sql;
create table have_long as
select a.account, a.year, b.balance
from have_long_a a, have_long_b b
where a.n = b.n and a.year = b.year and a.seq = b.seq
and a.account ne . and b.balance ne .
order by a.account, a.year
;
quit;
data ab_wide;
set account_balance2019;
n = _n_;
run;
proc transpose data=ab_wide out=_ac_2019 (rename=(col1=account) where=(account ne .));
by n;
var acct:;
run;
proc sort
data=_ac_2019 (keep=account)
out=ac_2019
;
by account;
run;
(original data step code kept for reference)
The datasets now look like this:
account year balance 101 2018 40 101 2020 10 103 2018 80 203 2018 80 205 2020 19 301 2018 84 303 2018 70 404 2020 20 406 2020 89 407 2018 60 501 2018 90 505 2020 20 505 2020 99 507 2020 14 602 2018 14 604 2020 19 801 2018 24 802 2020 87 808 2020 34 901 2018 10 901 2020 90 904 2018 20 905 2018 95 907 2020 99
and this:
account 101 203 301 505 801 808 901 907
Note that all processing up to this point is only done to bring the data into a shape it should have had from the beginning; either avoid any transposition to wide altogether, or if you get data in such a format, do the transpose when you load it into SAS.
(Hint: if you want to display data in a wide format for human consumption, SAS procedures (TABULATE or REPORT) can do that for you on the fly)
Now, the code is very simple:
data want;
merge
have_long (in=h)
have_long (
in=ac2020
keep=account year
rename=(year=_y)
where=(_y = "2020")
)
ac_2019 (in=ac2019)
;
by account;
if h;
output;
if year = '2018' and ac2019 and ac2020
then do;
year = "2019";
output;
end;
drop _y;
run;
Also note that the whole code makes no assumptions about the number of variables in your wide datasets, only that the names start with "acct" and "bal".
If you need to do this for different year constellations, you just need to replace the "2019" and "2020" in the code with macro variables.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.