I have been trying to create columns in sas based on substring. I need to create a coloumn of each sub string stored in a column seprated by '\'. For example:-
prexa\Medical Information\Medical Letters 02 is a single string stored in one column. i would like to have three columns col1: prexa col2 :Medical Information and Col3 : Medical. I am new to sas and have tried but unable to proceed.
this column has numeric values with no sub string as well.
Hi,
Well the first question you should be asking yourself is what is this data, how did I arrive with it. Now I do not know what the numbers are but the other part looks like some sort of directory listing. Is this data you have created? If so I would fix the code which generates this data, if it comes from a vendor then update your agreement so the data is in a useable format. Without knowledge of what the data is or how you came by it something like the below code might help, it creates a series of character variales for each split of the long string - note I do not do any conversions to number or anything else, just split the string up. The reason is I do not know this data, nor what its used for. 99% of any programming task is to understand your data and arrange that in a manner which is conducive to programming, the example data you provided looks like at least two different data items appended to each other - this is generally not good. Also, please post test data in the form of a datastep as text in your posts, its not great to have to type all that in to test code:
data have; infile datalines dlm="¬"; length long_string $2000; input long_string $; datalines; 6 7 3 4 P Toolbox\Customer program\peer to peer abc\ert\popt\abcdedgeg\defty ; run; proc sql noprint; select max(countw(LONG_STRING,"\")) into :NUM_ARRAY from HAVE; quit; data want; set have; array col{&num_array.} $100; do i=1 to countw(long_string,"\"); col{i}=scan(long_string,i,"\"); end; run;
I'm not sure this handles everything you see in your data, but the right tool to split up the text is the SCAN function:
long_string = "prexa\Medical Information\Medical";
col1 = scan(long_string, 1, '\');
col2 = scan(long_string, 2, '\');
col3 = scan(long_string, 3, '\');
Like this?
array COL [99] $32;
NUM=input(LONG_STRING, ?? dollar32.);
if NUM=. then do I=1 to 99 until (COL[I]='');
COL[I] = scan(LONG_STRING, I, '\');
end;
Edited: I had the informat name wrong !
Look up the input function documentation, it's all there. 🙂
thanks a lot. i will look into this. really Appericiate your help.
Thanks for the solution.
The number of observations isn't so important. It's the variables that matter.
You might consider skipping functions entirely and just using "\" as a delimiter when reading in the variables:
infile rawdata dlm="\" dsd;
input col1 $ col2 $ col3;
But you have to figure out where all the variables are. It won't do to figure out 3 at a time.
its not raw data file but a coloum with a string. and its substring are seprated by '/'. thanks again for all your help.
So given that you have a SAS data set with a single variable in it, what do you want the outcome to be at the end of all the processing?
And to populate the numeric variable you can add this:
NUM=input(LONG_STRING, ?? dollar32.);
if NUM=. then do;
COL1 = scan(LONG_STRING, 1, '\');
COL2 = scan(LONG_STRING, 2, '\');
COL3 = scan(LONG_STRING, 3, '\');
end;
Hi,
Well the first question you should be asking yourself is what is this data, how did I arrive with it. Now I do not know what the numbers are but the other part looks like some sort of directory listing. Is this data you have created? If so I would fix the code which generates this data, if it comes from a vendor then update your agreement so the data is in a useable format. Without knowledge of what the data is or how you came by it something like the below code might help, it creates a series of character variales for each split of the long string - note I do not do any conversions to number or anything else, just split the string up. The reason is I do not know this data, nor what its used for. 99% of any programming task is to understand your data and arrange that in a manner which is conducive to programming, the example data you provided looks like at least two different data items appended to each other - this is generally not good. Also, please post test data in the form of a datastep as text in your posts, its not great to have to type all that in to test code:
data have; infile datalines dlm="¬"; length long_string $2000; input long_string $; datalines; 6 7 3 4 P Toolbox\Customer program\peer to peer abc\ert\popt\abcdedgeg\defty ; run; proc sql noprint; select max(countw(LONG_STRING,"\")) into :NUM_ARRAY from HAVE; quit; data want; set have; array col{&num_array.} $100; do i=1 to countw(long_string,"\"); col{i}=scan(long_string,i,"\"); end; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.