BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Keaton
Calcite | Level 5

I have a variable with numerous (and varying) number of elements within it that is separated by a delimiter (;) and the following code below creates new variables for each observation based on the number of elements. Now as I am working through my analysis I am only interested in analyzing variables within the newly created ~30 that begin with 'abc' and was hoping someone would provide insight into the best way to go about this using SAS 9.4. The intended outcome is to identify key text after the initial 'abc' as well as to determine if key text is missing after the initial 'abc.'

 

*Building an array without knowing the max number of variables;
*proc sql stores a macro variable to determine the number of elements in the string to be applied -
then the data step parses by the delimiter into the set amount determined by the proc sql step;
proc sql noprint;
select max(count(original,';'))+1 into :maxelements from have;
Data want (drop=i);
set have;
array parsed_vars $ new1-new%eval(&maxelements);
do i=1 to &maxelements;
parsed_vars(i)=scan(original,i,";");
end;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

You probably don't need to split them up even to do that see below.  However that being said it would really be a good idea to split them up, but in long form rather than wide:

data want (keep=wrd);
  set have;
  length wrd $20;
  do i=1 to countw(original,",");
    wrd=scan(original,i,",");
    output;
  end;
run;

That is a far simpler and easier to work with data structure.

 

To show how to co it without changing:

proc sql;
  create table WANT as
  select sum(count(ORIGINAL,"TT")) as WANT
  from   HAVE;
quit;

View solution in original post

4 REPLIES 4
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Without seeing something in the way of data its hard to say.  The "best way" is subjective, you could for instance not create 30 observations, but output to rows, then utilise by grouping to analyse the data.  Or, if your variables have a common prefix you could use shortened versions:

result=max(of var:);

Assumes all the new variables have a prefix of var.  Or you could use lists, var1--var10.  There are many methods, and I would certainly avoid going down the macro route until you absolutely have to.  Also, when posting a question, post test data in the form of a datastep - this helps us to see the data and the structure, just saying 30 elements is meaningless - could be number or character or any mix.  For instance if they are all number, _numeric_ is a quick way of referring to them.

Keaton
Calcite | Level 5

Thanks for the suggestions.

 

Original dataset with original variable:

 

obs    original

1        AA456; TT3; BB345

2        TT3X5

3        AA234; BB35X; CC352

 

 

Parsed original variable into new variables as discussed in original post:

 

obs    original                                      new1        new2       new3         .....

1        AA456; TT3; BB345                 AA456       TT3        BB345       .....

2        TT3X5                                      TT3X5                                       .....

3        AA234; BB35X; CC352            AA234     BB35X    CC352       .....  

 

The output is how many observations have variables beginning with TT and additionally have less than 4 characters.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You probably don't need to split them up even to do that see below.  However that being said it would really be a good idea to split them up, but in long form rather than wide:

data want (keep=wrd);
  set have;
  length wrd $20;
  do i=1 to countw(original,",");
    wrd=scan(original,i,",");
    output;
  end;
run;

That is a far simpler and easier to work with data structure.

 

To show how to co it without changing:

proc sql;
  create table WANT as
  select sum(count(ORIGINAL,"TT")) as WANT
  from   HAVE;
quit;
Reeza
Super User

Post some sample data and your expected output. 

 


@Keaton wrote:

I have a variable with numerous (and varying) number of elements within it that is separated by a delimiter (;) and the following code below creates new variables for each observation based on the number of elements. Now as I am working through my analysis I am only interested in analyzing variables within the newly created ~30 that begin with 'abc' and was hoping someone would provide insight into the best way to go about this using SAS 9.4. The intended outcome is to identify key text after the initial 'abc' as well as to determine if key text is missing after the initial 'abc.'

 

*Building an array without knowing the max number of variables;
*proc sql stores a macro variable to determine the number of elements in the string to be applied -
then the data step parses by the delimiter into the set amount determined by the proc sql step;
proc sql noprint;
select max(count(original,';'))+1 into :maxelements from have;
Data want (drop=i);
set have;
array parsed_vars $ new1-new%eval(&maxelements);
do i=1 to &maxelements;
parsed_vars(i)=scan(original,i,";");
end;
run;


 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1511 views
  • 0 likes
  • 3 in conversation