BookmarkSubscribeRSS Feed
Solph
Pyrite | Level 9

I've a data set with a long list of numerical variables with spaces (all starting with numerical 7). I need to sum up some of the variables into 5-6 new variables. Here is a sample data with just 4 variables.

 

data have; 
input FacNo 1 '71190 food'n 3-5 '71230 adm act'n 7-9 '71340 occ therapy'n 11-13 '71350 physio therapy'n 15-16; datalines; 1 100 600 2 700 400 400 3 300 200 4 750 40 5 350 150 6 500 400 50 7 250 70 ; proc print noobs; run;

 

Here is what I'd like to have

FacNo 71190 food 71230 adm act 71340 occ therapy 71350 physio therapy Adm_cost Physio_cost
1 100   600   100 600
2 700 400 400   1100 400
3 300 200     500 0
4 750     40 750 40
5 350 150     500 0
6 500   400 50 500 450
7 250     70 250 70


I'd like to have a variable "Adm_cost" based on sum of variables with prefix 71190 and 71230, and Physio_cost based on variables with prefix 71340 and 71350. I know I can sum them up by just listing e.g. adm_cost=sum('71190 food'n, '71230 adm act' n), but since the actual data I have

- has a long list of variables with spaces (all starting with 7, having 5 numerical values, a space, then description),

- I need to create 5 variables out of the list of variables

So I'm hoping to find a better way to handle the code, using such as array, to make the code more concise, so that I can easily change how I want to group variables.

 

Thanks in advance. I really appreciate your help in this community.

7 REPLIES 7
Shmuel
Garnet | Level 18

Better way is to name the variables as V1 V2 V3 ..... Vn

and define labels with the names you gave:

data have;  
    input   v1    1 
               v2    3-5
               v3    7-9
             ..........
     ;
LABEL
   v1 = "FacNo"
   v2 =  '71190 food'
   v3 =  '71230 adm act'
  .......................
 ;
datalines;
1 100     600  
2 700 400 400  
3 300 200 
4 750         40
5 350 150	    
6 500     400 50
7 250         70
;
proc print noobs; run;
s_lassen
Meteorite | Level 14

As @Shmuel suggested, you may be better off with more standard variable names. But if that is possible, rename e.g. '71190 food' to something more similar, such as V791190_food - then you can sum those variable using

adm_cost=sum(of V71190:,of V71230:)

That approach will not work with variable names that start with numbers.

 

If you cannot (or will not) rename your variables, here's a possible solution:

data want;
  set have;
  array nums _numeric_;
  adm_cost=0;
  physio_cost=0;
  do _N_=1 to dim(nums);
    prefix=substr(vname(nums(_N_)),1,5);
	select(prefix);
	  when('71190','71230') adm_cost+nums(_N_);
	  when('71340','71350') physio_cost+nums(_N_);
	  otherwise;
	  end;
    end;
run;

 

Solph
Pyrite | Level 9

Thanks for the replies. Yes, I was left with no choice but to handle variable names with spaces. S_lassen's work perfectly. I only have one follow up questions That is what if my values are not a list of single values (A), but a range of values (B), a single value along with a range of values (C), or multiple ranges of values. How do I write the code. I have the following code written but the output values aren't right for B and C cases. Thanks in advance.

 

 

do _N_=1 to dim(nums);
    prefix=substr(vname(nums(_N_)),1,5);
	select (prefix);
           when ('71205','71206','71207') IPadm+nums(_N_);   /*A*/
when ('71360','71362','71365','71367','71369') ACdaysurg+nums(_N_); /*A*/ when ('71410' <=prefix<= '71434') Diagnostic_Clinic_vst+nums(_N_); /*B*/ when ('71435', '71440' < prefix<= '71499') Allied_health+nums(_N_); /*C*/
otherwise; end; end;

 

 

Peter_C
Rhodochrosite | Level 12
Have a look APPROX FORMAT.
It allows you to provide a description for multiple individual codes and ranges. With this kind of user defined format all / most procedures combine values with codes which have the same formatted description
Peter_C
Rhodochrosite | Level 12
PROC FORMAT
Solph
Pyrite | Level 9

Yes I can use format, but I've a horizontal data and I prefer to stay as is, not to output it into a vertical data, apply format, and then transpose it back to a horizontal data, after summing. s_lassen's code works well, but I just need to know how to tweak the code to allow specifying 1) a range of values, and 2) a range of values along with multiple single values. Hope the explanation helps.

Peter_C
Rhodochrosite | Level 12
You can use a user defined format in a put() function, instead of

prefix=substr(vname(nums(_N_)),1,5);
select(prefix);
when('71190','71230') adm_cost+nums(_N_);
when('71340','71350')

Use
Group_ID = put( prefix, $your_fmt. ) ;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1552 views
  • 0 likes
  • 4 in conversation