Apply substring Function to all variables in a dataset

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Apply substring Function to all variables in a dataset

I'm looking to apply a substring function to all columns in a dataset. Currently I've only had success doing it to one column at a time, but this isn't efficient or viable as this is a macro that won't always know the names or the amount of columns(There's also 100's of columns)

 

data have:
Var1      Var2         Var3       ...  VarN

01-AB    NA            NA             99-XX

NA         NA            14-EF        99-XX

NA         08-BC       NA             99-XX

 

data want:
Var1      Var2         Var3       ... VarN

01                                             99

                              14               99

               08                              99

 

As I said I can achieve the desired result using the below code, but I have to specify the column names. Really, I need the following function to apply to ALL columns, without specifying their names. I ran into the same problem using the compress function.

 

data Have;
set Want;
length var1 4;
var1  = substr(var1, 1, index(var1, '-') - 1);
run;

 

 

Thanks in Advance


Accepted Solutions
Solution
‎06-06-2018 12:16 PM
PROC Star
Posts: 1,784

Re: Apply substring Function to all variables in a dataset

data have;
input (Var1      Var2         Var3        Var4) ($);
cards;
01-AB    NA            NA             99-XX
NA         NA            14-EF        99-XX
NA         08-BC       NA             99-XX
;

data want;
set have;
array t(*) var:;
do _n_=1 to dim(t);
if index(t(_n_), '-')>0 then t(_n_)=substr(t(_n_), 1, index(t(_n_), '-') - 1);
else call missing(t(_n_));
end;
run;

View solution in original post


All Replies
Solution
‎06-06-2018 12:16 PM
PROC Star
Posts: 1,784

Re: Apply substring Function to all variables in a dataset

data have;
input (Var1      Var2         Var3        Var4) ($);
cards;
01-AB    NA            NA             99-XX
NA         NA            14-EF        99-XX
NA         08-BC       NA             99-XX
;

data want;
set have;
array t(*) var:;
do _n_=1 to dim(t);
if index(t(_n_), '-')>0 then t(_n_)=substr(t(_n_), 1, index(t(_n_), '-') - 1);
else call missing(t(_n_));
end;
run;
New Contributor
Posts: 3

Re: Apply substring Function to all variables in a dataset

Posted in reply to novinosrin

Fantastic! Thank you this is perfect!

Super User
Posts: 23,700

Re: Apply substring Function to all variables in a dataset

[ Edited ]

Are your variables actually named Var1-VarN? Do they have a specific naming convention? Do you have all character variables or mixed types?

 

You can use _character_ and _numeric_ to list all character and numeric variables respectively. 

 


@OPrxwr7 wrote:

I'm looking to apply a substring function to all columns in a dataset. Currently I've only had success doing it to one column at a time, but this isn't efficient or viable as this is a macro that won't always know the names or the amount of columns(There's also 100's of columns)

 

data have:
Var1      Var2         Var3       ...  VarN

01-AB    NA            NA             99-XX

NA         NA            14-EF        99-XX

NA         08-BC       NA             99-XX

 

data want:
Var1      Var2         Var3       ... VarN

01                                             99

                              14               99

               08                              99

 

As I said I can achieve the desired result using the below code, but I have to specify the column names. Really, I need the following function to apply to ALL columns, without specifying their names. I ran into the same problem using the compress function.

 

data Have;
set Want;
length var1 4;
var1  = substr(var1, 1, index(var1, '-') - 1);
run;

 

 

Thanks in Advance


 

 

New Contributor
Posts: 3

Re: Apply substring Function to all variables in a dataset

Sorry I should have mentioned, yes all the variable names will have a consistent "var" at the beginning, so novinosrin's use of the wildcards will work. Thanks for looking into it though! 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 107 views
  • 3 likes
  • 3 in conversation