BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ayin
Quartz | Level 8

The orginal dataset has columns like:

Time   A_1   A_2   B_1   B_2   C_1

and all the columns except 'Time' have numeric values.

 

Need the codes that can automatically add new columns: in this case, 

A_All  B_All  C_All

and

A_All = A_1 + A_2
B_All = B_1 + B_2
C_All = C_1

 

The problem is that, the column names of the original dataset, instead of 'A_:' 'B_:' 'C_:', can be anything (e.g. 'Apple_:', 'Test_:' etc.;  the new columns would accordingly have names like 'Apple_All', 'Test_All').

In addition, there could be any number of 'A_:' columns: e.g. 'A_1-A_7'

Hard coding like:

A_All = sum(of A_1-A_2);

wouldn't work therefore.

 

I'm expecting codes that achieve the following objectives:

1.  Read column names from the original dataset - have;
2.1 for columns whose names have a format like 'alphabetic characters + underscore + numbers': e.g. Type_3
2.2 identify how many columns there are whose names have the same alphabetic characters: e.g. 'Type'
3.  create new columns whose names are 'same alphabetic characters + underscore + "All"':   e.g. Type_All
4.  the value for the new column is therefore: e.g. sum(of Type_1-Type_&NumColType).

 

To make the question easier, the numeric part of the column names always starts from '1', and increases by 1. e.g. for 'Type_:' columns, it would be Type_1 Type_2 Type_3.

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

Ayin,

I would try something like this:

/* test data */
data have;
 retain apple_1-apple_8 g3-g6 g_3-g_6 1;
run;

proc sql;
  create table columns as select name from dictionary.columns
  where libname='WORK' and memname='HAVE' and name ? '_' and type='num'
  order by upcase(name);
quit;
filename tempsas temp;
data columns;
  set columns;
  prxid=prxparse('/^(.+_)\d+\s+$/');
  if prxmatch(prxid,name);
  prefix=prxposn(prxid,1,name);
  file tempsas;
  if upcase(prefix) ne lag(upcase(prefix)) then 
    put prefix +(-1) 'All = sum(of ' prefix +(-1) ':);';
run;
 
data want;
  set have;
  %include tempsas/source2;
run;

 

 

The SQL gets the names of numeric columns that contain an underscore. I sorted them by uppercase values in case of names like A_1, a_2 etc., which I assume that you want to group together. 

The data step writes a temporary SAS program which does the calculation. The PRXPARSE expression looks for beginning of string, a number of any characters followed by an underscore (the paranthesis catches this with a capture buffer), followed by a number of digits and finally optional whitespace. The contents of the capture buffer is put in the PREFIX variable, when the uppercase value of that changes, a line is written to the temporary SAS file.

 

The final step includes that file to calculate the sums.

Regards,

Søren

View solution in original post

9 REPLIES 9
Reeza
Super User

Look at using the SASHELP VCOLUMN table which will have all the variables and you can try and create your logic. 

ayin
Quartz | Level 8
Thanks for pointing out the direction.
Astounding
PROC Star

Note that there are other types of variable lists that can be used.  Most notably:

 

A_total = sum(of a_: );

 

A_: is an abbreviation for all variable names that begin with A_.

s_lassen
Meteorite | Level 14

Ayin,

I would try something like this:

/* test data */
data have;
 retain apple_1-apple_8 g3-g6 g_3-g_6 1;
run;

proc sql;
  create table columns as select name from dictionary.columns
  where libname='WORK' and memname='HAVE' and name ? '_' and type='num'
  order by upcase(name);
quit;
filename tempsas temp;
data columns;
  set columns;
  prxid=prxparse('/^(.+_)\d+\s+$/');
  if prxmatch(prxid,name);
  prefix=prxposn(prxid,1,name);
  file tempsas;
  if upcase(prefix) ne lag(upcase(prefix)) then 
    put prefix +(-1) 'All = sum(of ' prefix +(-1) ':);';
run;
 
data want;
  set have;
  %include tempsas/source2;
run;

 

 

The SQL gets the names of numeric columns that contain an underscore. I sorted them by uppercase values in case of names like A_1, a_2 etc., which I assume that you want to group together. 

The data step writes a temporary SAS program which does the calculation. The PRXPARSE expression looks for beginning of string, a number of any characters followed by an underscore (the paranthesis catches this with a capture buffer), followed by a number of digits and finally optional whitespace. The contents of the capture buffer is put in the PREFIX variable, when the uppercase value of that changes, a line is written to the temporary SAS file.

 

The final step includes that file to calculate the sums.

Regards,

Søren

ballardw
Super User

Your rules or process are missing at least one step: If your existing variable has a name with 29 or more characters you cannot create a new SAS variable by adding  "_All" to the existing name as the length would exceed the maximumn number of characters allowed.

s_lassen
Meteorite | Level 14

Thank you for your comment.

 

In principle you may be right. But it did not seem like that was a problem, and the OP did not specify what to do in that case. This is not a general macro for system-wide use, it is an efficient solution to a user's actual problem as stated.

 

BTW, do you know why my SAS code section is collapsed to one line when I view it? I pasted it from a SAS Studio program editor window, is it something with CRLF vs. LF as line breaks (meaning that I should edit the code in UltraEdit or similar first, or similar), or is it my browser (Opera) that has gone crazy? What does it look like in your browser?

Reeza
Super User

It collapsed. I'm not sure why., I tried to fix it but it looks fine when previewed in editor. 

ballardw
Super User

@s_lassen wrote:

Thank you for your comment.

 

In principle you may be right. But it did not seem like that was a problem, and the OP did not specify what to do in that case. This is not a general macro for system-wide use, it is an efficient solution to a user's actual problem as stated.

 


@s_lassen I assume you are responding to my comment about the variable name length. I was not responding to your post but the original poster. The OP may not realize this type of potential problem with the actual data involved.

This forum has some very odd display properties that it is not always obvious who the response is from unless you use the quote function so I can see why you may think I was commenting on your code..

 

The bit about how your code appears may come from using the "run" icon for posting. Code seems to work better with {i} code box as it does not attempt to maintain and appearance propertlies (font color for instance) and sometimes seems the other box does not interpret the tags correctly.

Reeza
Super User

@ballardw That's intersting, I've always used the running man/notebook icon because that's what I find works best.

 

@s_lassen I paste it into a text editor and then paste it back into the code block and that seems to fix the single line display issue.

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2021 views
  • 2 likes
  • 5 in conversation