DATA Step, Macro, Functions and more

Automatically create and name new columns (Advanced)

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 80
Accepted Solution

Automatically create and name new columns (Advanced)

[ Edited ]

The orginal dataset has columns like:

Time   A_1   A_2   B_1   B_2   C_1

and all the columns except 'Time' have numeric values.

 

Need the codes that can automatically add new columns: in this case, 

A_All  B_All  C_All

and

A_All = A_1 + A_2
B_All = B_1 + B_2
C_All = C_1

 

The problem is that, the column names of the original dataset, instead of 'A_:' 'B_:' 'C_:', can be anything (e.g. 'Apple_:', 'Test_:' etc.;  the new columns would accordingly have names like 'Apple_All', 'Test_All').

In addition, there could be any number of 'A_:' columns: e.g. 'A_1-A_7'

Hard coding like:

A_All = sum(of A_1-A_2);

wouldn't work therefore.

 

I'm expecting codes that achieve the following objectives:

1.  Read column names from the original dataset - have;
2.1 for columns whose names have a format like 'alphabetic characters + underscore + numbers': e.g. Type_3
2.2 identify how many columns there are whose names have the same alphabetic characters: e.g. 'Type'
3.  create new columns whose names are 'same alphabetic characters + underscore + "All"':   e.g. Type_All
4.  the value for the new column is therefore: e.g. sum(of Type_1-Type_&NumColType).

 

To make the question easier, the numeric part of the column names always starts from '1', and increases by 1. e.g. for 'Type_:' columns, it would be Type_1 Type_2 Type_3.


Accepted Solutions
Solution
‎03-28-2017 06:35 PM
PROC Star
Posts: 63

Re: Automatically create and name new columns (Advanced)

[ Edited ]

Ayin,

I would try something like this:

/* test data */
data have;
 retain apple_1-apple_8 g3-g6 g_3-g_6 1;
run;

proc sql;
  create table columns as select name from dictionary.columns
  where libname='WORK' and memname='HAVE' and name ? '_' and type='num'
  order by upcase(name);
quit;
filename tempsas temp;
data columns;
  set columns;
  prxid=prxparse('/^(.+_)\d+\s+$/');
  if prxmatch(prxid,name);
  prefix=prxposn(prxid,1,name);
  file tempsas;
  if upcase(prefix) ne lag(upcase(prefix)) then 
    put prefix +(-1) 'All = sum(of ' prefix +(-1) ':);';
run;
 
data want;
  set have;
  %include tempsas/source2;
run;

 

 

The SQL gets the names of numeric columns that contain an underscore. I sorted them by uppercase values in case of names like A_1, a_2 etc., which I assume that you want to group together. 

The data step writes a temporary SAS program which does the calculation. The PRXPARSE expression looks for beginning of string, a number of any characters followed by an underscore (the paranthesis catches this with a capture buffer), followed by a number of digits and finally optional whitespace. The contents of the capture buffer is put in the PREFIX variable, when the uppercase value of that changes, a line is written to the temporary SAS file.

 

The final step includes that file to calculate the sums.

Regards,

Søren

View solution in original post


All Replies
Super User
Posts: 17,829

Re: Automatically create and name new columns

Look at using the SASHELP VCOLUMN table which will have all the variables and you can try and create your logic. 

Frequent Contributor
Posts: 80

Re: Automatically create and name new columns

Thanks for pointing out the direction.
Super User
Posts: 5,083

Re: Automatically create and name new columns (Advanced)

[ Edited ]

Note that there are other types of variable lists that can be used.  Most notably:

 

A_total = sum(of a_: );

 

A_: is an abbreviation for all variable names that begin with A_.

Solution
‎03-28-2017 06:35 PM
PROC Star
Posts: 63

Re: Automatically create and name new columns (Advanced)

[ Edited ]

Ayin,

I would try something like this:

/* test data */
data have;
 retain apple_1-apple_8 g3-g6 g_3-g_6 1;
run;

proc sql;
  create table columns as select name from dictionary.columns
  where libname='WORK' and memname='HAVE' and name ? '_' and type='num'
  order by upcase(name);
quit;
filename tempsas temp;
data columns;
  set columns;
  prxid=prxparse('/^(.+_)\d+\s+$/');
  if prxmatch(prxid,name);
  prefix=prxposn(prxid,1,name);
  file tempsas;
  if upcase(prefix) ne lag(upcase(prefix)) then 
    put prefix +(-1) 'All = sum(of ' prefix +(-1) ':);';
run;
 
data want;
  set have;
  %include tempsas/source2;
run;

 

 

The SQL gets the names of numeric columns that contain an underscore. I sorted them by uppercase values in case of names like A_1, a_2 etc., which I assume that you want to group together. 

The data step writes a temporary SAS program which does the calculation. The PRXPARSE expression looks for beginning of string, a number of any characters followed by an underscore (the paranthesis catches this with a capture buffer), followed by a number of digits and finally optional whitespace. The contents of the capture buffer is put in the PREFIX variable, when the uppercase value of that changes, a line is written to the temporary SAS file.

 

The final step includes that file to calculate the sums.

Regards,

Søren

Super User
Posts: 10,500

Re: Automatically create and name new columns (Advanced)

Your rules or process are missing at least one step: If your existing variable has a name with 29 or more characters you cannot create a new SAS variable by adding  "_All" to the existing name as the length would exceed the maximumn number of characters allowed.

PROC Star
Posts: 63

Re: Automatically create and name new columns (Advanced)

Thank you for your comment.

 

In principle you may be right. But it did not seem like that was a problem, and the OP did not specify what to do in that case. This is not a general macro for system-wide use, it is an efficient solution to a user's actual problem as stated.

 

BTW, do you know why my SAS code section is collapsed to one line when I view it? I pasted it from a SAS Studio program editor window, is it something with CRLF vs. LF as line breaks (meaning that I should edit the code in UltraEdit or similar first, or similar), or is it my browser (Opera) that has gone crazy? What does it look like in your browser?

Super User
Posts: 17,829

Re: Automatically create and name new columns (Advanced)

It collapsed. I'm not sure why., I tried to fix it but it looks fine when previewed in editor. 

Super User
Posts: 10,500

Re: Automatically create and name new columns (Advanced)


s_lassen wrote:

Thank you for your comment.

 

In principle you may be right. But it did not seem like that was a problem, and the OP did not specify what to do in that case. This is not a general macro for system-wide use, it is an efficient solution to a user's actual problem as stated.

 


@s_lassen I assume you are responding to my comment about the variable name length. I was not responding to your post but the original poster. The OP may not realize this type of potential problem with the actual data involved.

This forum has some very odd display properties that it is not always obvious who the response is from unless you use the quote function so I can see why you may think I was commenting on your code..

 

The bit about how your code appears may come from using the "run" icon for posting. Code seems to work better with {i} code box as it does not attempt to maintain and appearance propertlies (font color for instance) and sometimes seems the other box does not interpret the tags correctly.

Super User
Posts: 17,829

Re: Automatically create and name new columns (Advanced)

@ballardw That's intersting, I've always used the running man/notebook icon because that's what I find works best.

 

@s_lassen I paste it into a text editor and then paste it back into the code block and that seems to fix the single line display issue.

 

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 243 views
  • 2 likes
  • 5 in conversation