I have a table that looks like this:
ID | EXPENDED_YR_1 | EXPENDED_AMOUNT_1 | EXPENDED_YR_2 | EXPENDED_AMOUNT_2 | EXPENDED_YR_3 | EXPENDED_AMOUNT_3 | EXPENDED_YR_4 | EXPENDED_AMOUNT_4 | EXPENDED_YR_5 | EXPENDED_AMOUNT_5 |
197168814080 | 2019 | 1223.71 | 2019 | 2832.47 | ||||||
197170819072 | 2018 | 6000 | ||||||||
197174228992 | 2018 | 45532.25 | 2019 | 8493.22 | 2019 | 656.55 | 2019 | 24455.78 | 2019 | 8499.91 |
197174243328 | ||||||||||
197169281024 | 2019 | 1024.01 | 2019 | 1890.14 | 2019 | 1046.24 |
There is an unique ID number with 5 expended_year columns and 5 expended_amount columns. The numbers on the end of each column name correspond with each other. So Expended_YR_1 corresponds to Expended_amount_1, Expended_YR_2 corresponds to Expended_amount_2, etc..
The desired output should look like this:
ID | ExpendituresYTD_2018 | ExpendituresYTD_2019 |
197168814080 | 0 | 4056.18 |
197170819072 | 6000 | 0 |
197174228992 | 45532.25 | 42105.46 |
197174243328 | 0 | 0 |
197169281024 | 0 | 3960.39 |
So summing by year for each record.
Thank you for your help in advance.
Something in this vein would do:
data have ;
input id expended_yr_1 expended_amount_1
expended_yr_2 expended_amount_2
expended_yr_3 expended_amount_3
expended_yr_4 expended_amount_4
expended_yr_5 expended_amount_5
;
cards ;
197168814080 2019 1223.71 2019 2832.47 . . . . . .
197170819072 2018 6000 . . . . . . . .
197174228992 2018 45532.25 2019 8493.22 2019 656.55 2019 24455.78 2019 8499.91
197174243328 . . . . . . . . . .
197169281024 2019 1024.01 2019 1890.14 2019 1046.24 . . . .
;
run ;
data want (keep = id expenditure:) ;
set have ;
array yr expended_yr: ;
array am expended_am: ;
expendituresytd_2018 = 0 ;
expendituresytd_2019 = 0 ;
do over yr ;
if yr = 2018 then expendituresytd_2018 + am ;
else if yr = 2019 then expendituresytd_2019 + am ;
end ;
run ;
Kind regards
Paul D.
HI @SasPerson85
You can try this:
data have;
infile datalines dlm="09"x;
input ID EXPENDED_YR_1 EXPENDED_AMOUNT_1 EXPENDED_YR_2 EXPENDED_AMOUNT_2 EXPENDED_YR_3 EXPENDED_AMOUNT_3 EXPENDED_YR_4 EXPENDED_AMOUNT_4 EXPENDED_YR_5 EXPENDED_AMOUNT_5;
datalines;
197168814080 2019 1223.71 2019 2832.47
197170819072 2018 6000
197174228992 2020 45532.25 2019 8493.22 2019 656.55 2019 24455.78 2019 8499.91
197174243328
197169281024 2019 1024.01 2019 1890.14 2019 1046.24
;
run;
options missing="0";
data have_exp;
set have;
array EXPENDED_YR_(5);
array EXPENDED_AMOUNT_(5);
do i=1 to 5;
EXPENDED_YR = EXPENDED_YR_(i);
EXPENDED_AMOUNT = EXPENDED_AMOUNT_(i);
output;
end;
keep ID EXPENDED_YR EXPENDED_AMOUNT;
run;
proc means data=have_exp sum noprint missing;
var EXPENDED_AMOUNT;
class ID EXPENDED_YR;
ways 2;
output out=have_sum (drop=_type_ _freq_) sum=sum;
run;
data have_sum2;
set have_sum;
by ID;
if EXPENDED_YR = . then EXPENDED_YR = 0;
if SUM = . then SUM = 0;
run;
proc transpose data=have_sum2 out=want(drop=_name_ ExpendituresYTD_0) prefix=ExpendituresYTD_;
var sum;
by ID;
ID EXPENDED_YR;
run;
options missing=".";
There are much better ways to arrange the data, that would make processing much simpler.
Example:
ID Year Amount 111 2018 265.32 111 2019 1244.09 112 2018 27.22
Something in this vein would do:
data have ;
input id expended_yr_1 expended_amount_1
expended_yr_2 expended_amount_2
expended_yr_3 expended_amount_3
expended_yr_4 expended_amount_4
expended_yr_5 expended_amount_5
;
cards ;
197168814080 2019 1223.71 2019 2832.47 . . . . . .
197170819072 2018 6000 . . . . . . . .
197174228992 2018 45532.25 2019 8493.22 2019 656.55 2019 24455.78 2019 8499.91
197174243328 . . . . . . . . . .
197169281024 2019 1024.01 2019 1890.14 2019 1046.24 . . . .
;
run ;
data want (keep = id expenditure:) ;
set have ;
array yr expended_yr: ;
array am expended_am: ;
expendituresytd_2018 = 0 ;
expendituresytd_2019 = 0 ;
do over yr ;
if yr = 2018 then expendituresytd_2018 + am ;
else if yr = 2019 then expendituresytd_2019 + am ;
end ;
run ;
Kind regards
Paul D.
data have;
infile cards truncover;
input ID :$20. EXPENDED_YR_1 EXPENDED_AMOUNT_1 EXPENDED_YR_2 EXPENDED_AMOUNT_2 EXPENDED_YR_3 EXPENDED_AMOUNT_3 EXPENDED_YR_4 EXPENDED_AMOUNT_4 EXPENDED_YR_5 EXPENDED_AMOUNT_5;
cards;
197168814080 2019 1223.71 2019 2832.47
197170819072 2018 6000
197174228992 2018 45532.25 2019 8493.22 2019 656.55 2019 24455.78 2019 8499.91
197174243328
197169281024 2019 1024.01 2019 1890.14 2019 1046.24
;
/*Transpose wide to long*/
data temp;
set have;
array t(5,2) EXPENDED_YR_1--EXPENDED_AMOUNT_5;
do i=1 to 5 until(not year);
year=t(i,1);
Amt=t(i,2);
output;
end;
keep id year amt;
run;
options missing='0';
proc summary data=temp nway missing completetypes;
class ID year;
var amt;
output out=temp2(where=(year ne .) drop=_type_ _freq_) sum=;
run;
proc transpose data=temp2 out=want(drop=_:) prefix=ExpendituresYTD_;
by id ;
var amt;
id year;
run;
proc print noobs;run;
ID | ExpendituresYTD_2018 | ExpendituresYTD_2019 |
---|---|---|
197168814080 | 0 | 4056.18 |
197169281024 | 0 | 3960.39 |
197170819072 | 6000.00 | 0 |
197174228992 | 45532.25 | 42105.46 |
197174243328 | 0 | 0 |
You already have matching arrays of year variables (range from 2018 to 2020) and expenditure variables. Since you want total for a subset of those years (2018,2019), this is an excellent situation to use an array (EXPYTD below) with lower bound 2018 and upper bound 2019:
data have;
infile datalines truncover;
input ID EXPENDED_YR_1 EXPENDED_AMOUNT_1 EXPENDED_YR_2 EXPENDED_AMOUNT_2 EXPENDED_YR_3 EXPENDED_AMOUNT_3 EXPENDED_YR_4 EXPENDED_AMOUNT_4 EXPENDED_YR_5 EXPENDED_AMOUNT_5;
datalines;
197168814080 2019 1223.71 2019 2832.47
197170819072 2018 6000
197174228992 2020 45532.25 2019 8493.22 2019 656.55 2019 24455.78 2019 8499.91
197174243328
197169281024 2019 1024.01 2019 1890.14 2019 1046.24
run;
data want (drop=i);
set have;
array expytd {2018:2019} expenditures_ytd2018 expenditures_ytd2019 ;
do i=2018 to 2019; expytd{i}=0; end;
array yr expended_yr: ;
array amt expended_amount: ;
do over yr ;
if yr in (2018,2019) then expytd{yr}=sum(expytd{yr},amt);
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.