SAS Array Creating A New Dataset And Matching

kk13 · Posted 10-02-2020 02:31 AM

I have a data set which has account numbers and balances in 2018 and 2020. I found the account numbers in 2019 which has some accounts from 2018 and 2020. I would like to create an array of balances for 2019, where if the 2019 account number matches with either 2018 or 2020, then pick the balance 2018. (Data Want shows what I would like!)

For example, the first observation has account number 101 in 2019. This account was present in both 2018 and 2020. Therefore, I would like the 2019 balance (bal2019_1) to equal (bal2018_1).

data have;

input acct2018_1 acct2018_2 acct2018_3 acct2018_4 acct2018_5 bal2018_1 bal2018_2 bal2018_3 bal2018_4 bal2018_5

acct2020_1 acct2020_2 acct2020_3 acct2020_4 acct2020_5 bal2020_1 bal2020_2 bal2020_3 bal2020_4 bal2020_5;

datalines;

101 407 103 . . 40 60 80 . . 101 604 505 . . 10 19 20 . .

303 203 . . . 70 80 . . . 507 205 406 907 . 14 19 89 99 .

901 602 801 . . 10 14 24 . . 404 901 505 802 . 20 90 99 87 .

301 501 904 905 . 84 90 20 95 . 808 . . . . 34 . . . . .

;

run;

data account_balance2019;

input acct2019_1 acct2019_2 acct2019_3 acct2019_4 acct2019_5 ;

datalines;

101 . . . .

907 203 . . .

801 505 901 . .

808 301 . . .

;

run;

data want;

input acct2018_1 acct2018_2 acct2018_3 acct2018_4 acct2018_5 bal2018_1 bal2018_2 bal2018_3 bal2018_4 bal2018_5

acct2019_1 acct2019_2 acct2019_3 acct2019_4 acct2019_5 bal2019_1 bal2019_2 bal2019_3 bal2019_4 bal2019_5

acct2020_1 acct2020_2 acct2020_3 acct2020_4 acct2020_5 bal2020_1 bal2020_2 bal2020_3 bal2020_4 bal2020_5;

datalines;

101 407 103 . . 40 60 80 . . 101 . . . . 40 . . . . . 101 604 505 . . 10 19 20 . .

303 203 . . . 70 80 . . . 907 203 . . . 99 80 . . . 507 205 406 907 . 14 19 89 99 .

901 602 801 . . 10 14 24 . . 801 505 901 . . 24 99 10 . . 404 901 505 802 . 20 90 99 87 .

301 501 904 905 . 84 90 20 95 . 808 301 . . . 34 84 . . . 808 . . . . 34 . . . . .

;

run;

Oligolas · Posted 10-02-2020 07:20 AM

Hi,

use arrays like this:

data have1;
   set have;
   id+1;
run;
data ab1;
   set account_balance2019;
   id+1;
run;

data hab0;
   merge have1 ab1;
   by id;
run;


data hab1;
   length bal2019_1 bal2019_2 bal2019_3 bal2019_4 bal2019_5 8;
   set hab0;
   call missing(bal2019_1,bal2019_2,bal2019_3,bal2019_4,bal2019_5);
   array a18{5} acct2018_1-acct2018_5;
   array a19{5} acct2019_1-acct2019_5;
   array a20{5} acct2020_1-acct2020_5;

   array b18{5} bal2018_1-bal2018_5;
   array b19{5} bal2019_1-bal2019_5;
   array b20{5} bal2020_1-bal2020_5;

   do i=1 to 5;
      if a19[i] ne . then do;
         do j=1 to 5; *scan all 2018 values;
            if a19[i] eq a18[j] then do;
               b19[i]=b18[j];
               leave; *stop loop if value found;
            end;          
         end;
         
         if b19[i] eq . then do; *if still no value found in 2018;
            do k=1 to 5; *scan all 2020 values;
               if a19[i] eq a20[k] then do;
                  b19[i]=b20[k]; *stop loop if value found;
                  leave;
               end;
            end;
         end;
      end;
   end;
   drop id i j k;
run;

*reordering the items;
PROC SQL noprint;
   select name into :names separated by ' '
   from sashelp.vcolumn
   where libname eq 'WORK'
   and memname eq 'HAB1'
   order by scan(compress(name,,'Aa'),1,'_'), substr(name,1,1), scan(name,2,'_')
   ;
QUIT;

DATA Want181920;
   ATTRIB &names. label='';
   set hab1;
RUN;
%symdel names;

________________________
- Cheers -

Kurt_Bremser · Posted 10-02-2020 09:56 AM

This is another classical example for bad data modeling forcing crap (expletive used intentionally!) code.

The first thing you do is transpose those unusable wide datasets to a long layout:

data have;
input acct2018_1 acct2018_2 acct2018_3 acct2018_4 acct2018_5 bal2018_1 bal2018_2 bal2018_3 bal2018_4 bal2018_5
 
acct2020_1 acct2020_2 acct2020_3 acct2020_4 acct2020_5 bal2020_1 bal2020_2 bal2020_3 bal2020_4 bal2020_5;
datalines;
101 407 103 . . 40 60 80 . . 101 604 505 . . 10 19 20 . .
303 203 . . . 70 80 . . . 507 205 406 907 . 14 19 89 99 .
901 602 801 . . 10 14 24 . . 404 901 505 802 . 20 90 99 87 .
301 501 904 905 . 84 90 20 95 . 808 . . . . 34 . . . . .
;

data account_balance2019;
input acct2019_1 acct2019_2 acct2019_3 acct2019_4 acct2019_5 ;
datalines;
101 . . . .
907 203 . . .
801 505 901 . .
808 301 . . .
;

data have_wide;
set have;
n = _n_;
run;

proc transpose data=have_wide out=_have_long_a (rename=(col1=account));
by n;
var acct:;
run;

proc transpose data=have_wide out=_have_long_b (rename=(col1=balance));
by n;
var bal:;
run;

data have_long_a;
set _have_long_a;
year = substr(_name_,5,4);
seq = substr(_name_,10);
drop _name_;
run;

data have_long_b;
set _have_long_b;
year = substr(_name_,4,4);
seq = substr(_name_,9);
drop _name_;
run;

proc sql;
create table have_long as
  select a.account, a.year, b.balance
  from have_long_a a, have_long_b b
  where a.n = b.n and a.year = b.year and a.seq = b.seq
    and a.account ne . and b.balance ne .
  order by a.account, a.year
;
quit;

data ab_wide;
set account_balance2019;
n = _n_;
run;

proc transpose data=ab_wide out=_ac_2019 (rename=(col1=account) where=(account ne .));
by n;
var acct:;
run;

proc sort
  data=_ac_2019 (keep=account)
  out=ac_2019
;
by account;
run;

(original data step code kept for reference)

The datasets now look like this:

account	year	balance
101	2018	40
101	2020	10
103	2018	80
203	2018	80
205	2020	19
301	2018	84
303	2018	70
404	2020	20
406	2020	89
407	2018	60
501	2018	90
505	2020	20
505	2020	99
507	2020	14
602	2018	14
604	2020	19
801	2018	24
802	2020	87
808	2020	34
901	2018	10
901	2020	90
904	2018	20
905	2018	95
907	2020	99

and this:

Note that all processing up to this point is only done to bring the data into a shape it should have had from the beginning; either avoid any transposition to wide altogether, or if you get data in such a format, do the transpose when you load it into SAS.

(Hint: if you want to display data in a wide format for human consumption, SAS procedures (TABULATE or REPORT) can do that for you on the fly)

Now, the code is very simple:

data want;
merge
  have_long (in=h)
  have_long (
    in=ac2020
    keep=account year
    rename=(year=_y)
    where=(_y = "2020")
  )
  ac_2019 (in=ac2019)
;
by account;
if h;
output;
if year = '2018' and ac2019 and ac2020
then do;
  year = "2019";
  output;
end;
drop _y;
run;

Also note that the whole code makes no assumptions about the number of variables in your wide datasets, only that the names start with "acct" and "bal".

If you need to do this for different year constellations, you just need to replace the "2019" and "2020" in the code with macro variables.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

SAS Array Creating A New Dataset And Matching

Re: SAS Array Creating A New Dataset And Matching

Re: SAS Array Creating A New Dataset And Matching

Catch up on SAS Innovate 2026

SAS Array Creating A New Dataset And Matching

Re: SAS Array Creating A New Dataset And Matching

Re: SAS Array Creating A New Dataset And Matching

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away