I have a variable with numerous (and varying) number of elements within it that is separated by a delimiter (;) and the following code below creates new variables for each observation based on the number of elements. Now as I am working through my analysis I am only interested in analyzing variables within the newly created ~30 that begin with 'abc' and was hoping someone would provide insight into the best way to go about this using SAS 9.4. The intended outcome is to identify key text after the initial 'abc' as well as to determine if key text is missing after the initial 'abc.'
*Building an array without knowing the max number of variables;
*proc sql stores a macro variable to determine the number of elements in the string to be applied -
then the data step parses by the delimiter into the set amount determined by the proc sql step;
proc sql noprint;
select max(count(original,';'))+1 into :maxelements from have;
Data want (drop=i);
set have;
array parsed_vars $ new1-new%eval(&maxelements);
do i=1 to &maxelements;
parsed_vars(i)=scan(original,i,";");
end;
run;
You probably don't need to split them up even to do that see below. However that being said it would really be a good idea to split them up, but in long form rather than wide:
data want (keep=wrd); set have; length wrd $20; do i=1 to countw(original,","); wrd=scan(original,i,","); output; end; run;
That is a far simpler and easier to work with data structure.
To show how to co it without changing:
proc sql; create table WANT as select sum(count(ORIGINAL,"TT")) as WANT from HAVE; quit;
Without seeing something in the way of data its hard to say. The "best way" is subjective, you could for instance not create 30 observations, but output to rows, then utilise by grouping to analyse the data. Or, if your variables have a common prefix you could use shortened versions:
result=max(of var:);
Assumes all the new variables have a prefix of var. Or you could use lists, var1--var10. There are many methods, and I would certainly avoid going down the macro route until you absolutely have to. Also, when posting a question, post test data in the form of a datastep - this helps us to see the data and the structure, just saying 30 elements is meaningless - could be number or character or any mix. For instance if they are all number, _numeric_ is a quick way of referring to them.
Thanks for the suggestions.
Original dataset with original variable:
obs original
1 AA456; TT3; BB345
2 TT3X5
3 AA234; BB35X; CC352
Parsed original variable into new variables as discussed in original post:
obs original new1 new2 new3 .....
1 AA456; TT3; BB345 AA456 TT3 BB345 .....
2 TT3X5 TT3X5 .....
3 AA234; BB35X; CC352 AA234 BB35X CC352 .....
The output is how many observations have variables beginning with TT and additionally have less than 4 characters.
You probably don't need to split them up even to do that see below. However that being said it would really be a good idea to split them up, but in long form rather than wide:
data want (keep=wrd); set have; length wrd $20; do i=1 to countw(original,","); wrd=scan(original,i,","); output; end; run;
That is a far simpler and easier to work with data structure.
To show how to co it without changing:
proc sql; create table WANT as select sum(count(ORIGINAL,"TT")) as WANT from HAVE; quit;
Post some sample data and your expected output.
@Keaton wrote:
I have a variable with numerous (and varying) number of elements within it that is separated by a delimiter (;) and the following code below creates new variables for each observation based on the number of elements. Now as I am working through my analysis I am only interested in analyzing variables within the newly created ~30 that begin with 'abc' and was hoping someone would provide insight into the best way to go about this using SAS 9.4. The intended outcome is to identify key text after the initial 'abc' as well as to determine if key text is missing after the initial 'abc.'
*Building an array without knowing the max number of variables;
*proc sql stores a macro variable to determine the number of elements in the string to be applied -
then the data step parses by the delimiter into the set amount determined by the proc sql step;
proc sql noprint;
select max(count(original,';'))+1 into :maxelements from have;
Data want (drop=i);
set have;
array parsed_vars $ new1-new%eval(&maxelements);
do i=1 to &maxelements;
parsed_vars(i)=scan(original,i,";");
end;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.