I've a data set with a long list of numerical variables with spaces (all starting with numerical 7). I need to sum up some of the variables into 5-6 new variables. Here is a sample data with just 4 variables.
data have;
input FacNo 1 '71190 food'n 3-5 '71230 adm act'n 7-9 '71340 occ therapy'n 11-13 '71350 physio therapy'n 15-16;
datalines;
1 100 600
2 700 400 400
3 300 200
4 750 40
5 350 150
6 500 400 50
7 250 70
;
proc print noobs; run;
Here is what I'd like to have
FacNo | 71190 food | 71230 adm act | 71340 occ therapy | 71350 physio therapy | Adm_cost | Physio_cost |
1 | 100 | 600 | 100 | 600 | ||
2 | 700 | 400 | 400 | 1100 | 400 | |
3 | 300 | 200 | 500 | 0 | ||
4 | 750 | 40 | 750 | 40 | ||
5 | 350 | 150 | 500 | 0 | ||
6 | 500 | 400 | 50 | 500 | 450 | |
7 | 250 | 70 | 250 | 70 |
I'd like to have a variable "Adm_cost" based on sum of variables with prefix 71190 and 71230, and Physio_cost based on variables with prefix 71340 and 71350. I know I can sum them up by just listing e.g. adm_cost=sum('71190 food'n, '71230 adm act' n), but since the actual data I have
- has a long list of variables with spaces (all starting with 7, having 5 numerical values, a space, then description),
- I need to create 5 variables out of the list of variables
So I'm hoping to find a better way to handle the code, using such as array, to make the code more concise, so that I can easily change how I want to group variables.
Thanks in advance. I really appreciate your help in this community.
Better way is to name the variables as V1 V2 V3 ..... Vn
and define labels with the names you gave:
data have;
input v1 1
v2 3-5
v3 7-9
..........
;
LABEL
v1 = "FacNo"
v2 = '71190 food'
v3 = '71230 adm act'
.......................
;
datalines;
1 100 600
2 700 400 400
3 300 200
4 750 40
5 350 150
6 500 400 50
7 250 70
;
proc print noobs; run;
As @Shmuel suggested, you may be better off with more standard variable names. But if that is possible, rename e.g. '71190 food' to something more similar, such as V791190_food - then you can sum those variable using
adm_cost=sum(of V71190:,of V71230:)
That approach will not work with variable names that start with numbers.
If you cannot (or will not) rename your variables, here's a possible solution:
data want;
set have;
array nums _numeric_;
adm_cost=0;
physio_cost=0;
do _N_=1 to dim(nums);
prefix=substr(vname(nums(_N_)),1,5);
select(prefix);
when('71190','71230') adm_cost+nums(_N_);
when('71340','71350') physio_cost+nums(_N_);
otherwise;
end;
end;
run;
Thanks for the replies. Yes, I was left with no choice but to handle variable names with spaces. S_lassen's work perfectly. I only have one follow up questions That is what if my values are not a list of single values (A), but a range of values (B), a single value along with a range of values (C), or multiple ranges of values. How do I write the code. I have the following code written but the output values aren't right for B and C cases. Thanks in advance.
do _N_=1 to dim(nums);
prefix=substr(vname(nums(_N_)),1,5);
select (prefix);
when ('71205','71206','71207') IPadm+nums(_N_); /*A*/
when ('71360','71362','71365','71367','71369') ACdaysurg+nums(_N_); /*A*/
when ('71410' <=prefix<= '71434') Diagnostic_Clinic_vst+nums(_N_); /*B*/
when ('71435', '71440' < prefix<= '71499') Allied_health+nums(_N_); /*C*/
otherwise;
end;
end;
Yes I can use format, but I've a horizontal data and I prefer to stay as is, not to output it into a vertical data, apply format, and then transpose it back to a horizontal data, after summing. s_lassen's code works well, but I just need to know how to tweak the code to allow specifying 1) a range of values, and 2) a range of values along with multiple single values. Hope the explanation helps.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.