BookmarkSubscribeRSS Feed
Solph
Pyrite | Level 9

I've a data set with a long list of numerical variables with spaces (all starting with numerical 7). I need to sum up some of the variables into 5-6 new variables. Here is a sample data with just 4 variables.

 

data have; 
input FacNo 1 '71190 food'n 3-5 '71230 adm act'n 7-9 '71340 occ therapy'n 11-13 '71350 physio therapy'n 15-16; datalines; 1 100 600 2 700 400 400 3 300 200 4 750 40 5 350 150 6 500 400 50 7 250 70 ; proc print noobs; run;

 

Here is what I'd like to have

FacNo 71190 food 71230 adm act 71340 occ therapy 71350 physio therapy Adm_cost Physio_cost
1 100   600   100 600
2 700 400 400   1100 400
3 300 200     500 0
4 750     40 750 40
5 350 150     500 0
6 500   400 50 500 450
7 250     70 250 70


I'd like to have a variable "Adm_cost" based on sum of variables with prefix 71190 and 71230, and Physio_cost based on variables with prefix 71340 and 71350. I know I can sum them up by just listing e.g. adm_cost=sum('71190 food'n, '71230 adm act' n), but since the actual data I have

- has a long list of variables with spaces (all starting with 7, having 5 numerical values, a space, then description),

- I need to create 5 variables out of the list of variables

So I'm hoping to find a better way to handle the code, using such as array, to make the code more concise, so that I can easily change how I want to group variables.

 

Thanks in advance. I really appreciate your help in this community.

7 REPLIES 7
Shmuel
Garnet | Level 18

Better way is to name the variables as V1 V2 V3 ..... Vn

and define labels with the names you gave:

data have;  
    input   v1    1 
               v2    3-5
               v3    7-9
             ..........
     ;
LABEL
   v1 = "FacNo"
   v2 =  '71190 food'
   v3 =  '71230 adm act'
  .......................
 ;
datalines;
1 100     600  
2 700 400 400  
3 300 200 
4 750         40
5 350 150	    
6 500     400 50
7 250         70
;
proc print noobs; run;
s_lassen
Meteorite | Level 14

As @Shmuel suggested, you may be better off with more standard variable names. But if that is possible, rename e.g. '71190 food' to something more similar, such as V791190_food - then you can sum those variable using

adm_cost=sum(of V71190:,of V71230:)

That approach will not work with variable names that start with numbers.

 

If you cannot (or will not) rename your variables, here's a possible solution:

data want;
  set have;
  array nums _numeric_;
  adm_cost=0;
  physio_cost=0;
  do _N_=1 to dim(nums);
    prefix=substr(vname(nums(_N_)),1,5);
	select(prefix);
	  when('71190','71230') adm_cost+nums(_N_);
	  when('71340','71350') physio_cost+nums(_N_);
	  otherwise;
	  end;
    end;
run;

 

Solph
Pyrite | Level 9

Thanks for the replies. Yes, I was left with no choice but to handle variable names with spaces. S_lassen's work perfectly. I only have one follow up questions That is what if my values are not a list of single values (A), but a range of values (B), a single value along with a range of values (C), or multiple ranges of values. How do I write the code. I have the following code written but the output values aren't right for B and C cases. Thanks in advance.

 

 

do _N_=1 to dim(nums);
    prefix=substr(vname(nums(_N_)),1,5);
	select (prefix);
           when ('71205','71206','71207') IPadm+nums(_N_);   /*A*/
when ('71360','71362','71365','71367','71369') ACdaysurg+nums(_N_); /*A*/ when ('71410' <=prefix<= '71434') Diagnostic_Clinic_vst+nums(_N_); /*B*/ when ('71435', '71440' < prefix<= '71499') Allied_health+nums(_N_); /*C*/
otherwise; end; end;

 

 

Peter_C
Rhodochrosite | Level 12
Have a look APPROX FORMAT.
It allows you to provide a description for multiple individual codes and ranges. With this kind of user defined format all / most procedures combine values with codes which have the same formatted description
Peter_C
Rhodochrosite | Level 12
PROC FORMAT
Solph
Pyrite | Level 9

Yes I can use format, but I've a horizontal data and I prefer to stay as is, not to output it into a vertical data, apply format, and then transpose it back to a horizontal data, after summing. s_lassen's code works well, but I just need to know how to tweak the code to allow specifying 1) a range of values, and 2) a range of values along with multiple single values. Hope the explanation helps.

Peter_C
Rhodochrosite | Level 12
You can use a user defined format in a put() function, instead of

prefix=substr(vname(nums(_N_)),1,5);
select(prefix);
when('71190','71230') adm_cost+nums(_N_);
when('71340','71350')

Use
Group_ID = put( prefix, $your_fmt. ) ;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1224 views
  • 0 likes
  • 4 in conversation