DATA Step, Macro, Functions and more

What is the best way to analyze numerous variables beginning with a specific text such as 'abc'

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

What is the best way to analyze numerous variables beginning with a specific text such as 'abc'

I have a variable with numerous (and varying) number of elements within it that is separated by a delimiter (Smiley Wink and the following code below creates new variables for each observation based on the number of elements. Now as I am working through my analysis I am only interested in analyzing variables within the newly created ~30 that begin with 'abc' and was hoping someone would provide insight into the best way to go about this using SAS 9.4. The intended outcome is to identify key text after the initial 'abc' as well as to determine if key text is missing after the initial 'abc.'

 

*Building an array without knowing the max number of variables;
*proc sql stores a macro variable to determine the number of elements in the string to be applied -
then the data step parses by the delimiter into the set amount determined by the proc sql step;
proc sql noprint;
select max(count(original,';'))+1 into :maxelements from have;
Data want (drop=i);
set have;
array parsed_vars $ new1-new%eval(&maxelements);
do i=1 to &maxelements;
parsed_vars(i)=scan(original,i,";");
end;
run;


Accepted Solutions
Solution
‎11-29-2017 03:19 PM
Super User
Super User
Posts: 9,211

Re: What is the best way to analyze numerous variables beginning with a specific text such as 'abc'

You probably don't need to split them up even to do that see below.  However that being said it would really be a good idea to split them up, but in long form rather than wide:

data want (keep=wrd);
  set have;
  length wrd $20;
  do i=1 to countw(original,",");
    wrd=scan(original,i,",");
    output;
  end;
run;

That is a far simpler and easier to work with data structure.

 

To show how to co it without changing:

proc sql;
  create table WANT as
  select sum(count(ORIGINAL,"TT")) as WANT
  from   HAVE;
quit;

View solution in original post


All Replies
Super User
Super User
Posts: 9,211

Re: What is the best way to analyze numerous variables beginning with a specific text such as 'abc'

Without seeing something in the way of data its hard to say.  The "best way" is subjective, you could for instance not create 30 observations, but output to rows, then utilise by grouping to analyse the data.  Or, if your variables have a common prefix you could use shortened versions:

result=max(of var:);

Assumes all the new variables have a prefix of var.  Or you could use lists, var1--var10.  There are many methods, and I would certainly avoid going down the macro route until you absolutely have to.  Also, when posting a question, post test data in the form of a datastep - this helps us to see the data and the structure, just saying 30 elements is meaningless - could be number or character or any mix.  For instance if they are all number, _numeric_ is a quick way of referring to them.

New Contributor
Posts: 2

Re: What is the best way to analyze numerous variables beginning with a specific text such as 'abc'

Thanks for the suggestions.

 

Original dataset with original variable:

 

obs    original

1        AA456; TT3; BB345

2        TT3X5

3        AA234; BB35X; CC352

 

 

Parsed original variable into new variables as discussed in original post:

 

obs    original                                      new1        new2       new3         .....

1        AA456; TT3; BB345                 AA456       TT3        BB345       .....

2        TT3X5                                      TT3X5                                       .....

3        AA234; BB35X; CC352            AA234     BB35X    CC352       .....  

 

The output is how many observations have variables beginning with TT and additionally have less than 4 characters.

Solution
‎11-29-2017 03:19 PM
Super User
Super User
Posts: 9,211

Re: What is the best way to analyze numerous variables beginning with a specific text such as 'abc'

You probably don't need to split them up even to do that see below.  However that being said it would really be a good idea to split them up, but in long form rather than wide:

data want (keep=wrd);
  set have;
  length wrd $20;
  do i=1 to countw(original,",");
    wrd=scan(original,i,",");
    output;
  end;
run;

That is a far simpler and easier to work with data structure.

 

To show how to co it without changing:

proc sql;
  create table WANT as
  select sum(count(ORIGINAL,"TT")) as WANT
  from   HAVE;
quit;
Super User
Posts: 22,850

Re: What is the best way to analyze numerous variables beginning with a specific text such as 'abc'

Post some sample data and your expected output. 

 


Keaton wrote:

I have a variable with numerous (and varying) number of elements within it that is separated by a delimiter (Smiley Wink and the following code below creates new variables for each observation based on the number of elements. Now as I am working through my analysis I am only interested in analyzing variables within the newly created ~30 that begin with 'abc' and was hoping someone would provide insight into the best way to go about this using SAS 9.4. The intended outcome is to identify key text after the initial 'abc' as well as to determine if key text is missing after the initial 'abc.'

 

*Building an array without knowing the max number of variables;
*proc sql stores a macro variable to determine the number of elements in the string to be applied -
then the data step parses by the delimiter into the set amount determined by the proc sql step;
proc sql noprint;
select max(count(original,';'))+1 into :maxelements from have;
Data want (drop=i);
set have;
array parsed_vars $ new1-new%eval(&maxelements);
do i=1 to &maxelements;
parsed_vars(i)=scan(original,i,";");
end;
run;


 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 173 views
  • 0 likes
  • 3 in conversation