BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
vinayakraja
Fluorite | Level 6

I have been trying to create columns in sas based on substring. I need to create a coloumn of each sub string stored in a column seprated by '\'. For example:-

prexa\Medical Information\Medical Letters 02 is a single string stored in one column. i would like to have three columns col1: prexa col2 :Medical Information and Col3 : Medical. I am new to sas and have tried but unable to proceed.

this column has numeric values with no sub string as well.Content of Column.png

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

 

Well the first question you should be asking yourself is what is this data, how did I arrive with it. Now I do not know what the numbers are but the other part looks like some sort of directory listing.  Is this data you have created?  If so I would fix the code which generates this data, if it comes from a vendor then update your agreement so the data is in a useable format.  Without knowledge of what the data is or how you came by it something like the below code might help, it creates a series of character variales for each split of the long string - note I do not do any conversions to number or anything else, just split the string up.  The reason is I do not know this data, nor what its used for.  99% of any programming task is to understand your data and arrange that in a manner which is conducive to programming, the example data you provided looks like at least two different data items appended to each other - this is generally not good.  Also, please post test data in the form of a datastep as text in your posts, its not great to have to type all that in to test code:

data have;
  infile datalines dlm="¬";
  length long_string $2000;
  input long_string $;
datalines;
6
7
3
4
P Toolbox\Customer program\peer to peer
abc\ert\popt\abcdedgeg\defty
;
run;

proc sql noprint;
  select  max(countw(LONG_STRING,"\"))
  into    :NUM_ARRAY
  from    HAVE;
quit;

data want;
  set have;
  array col{&num_array.} $100;
  do i=1 to countw(long_string,"\");
    col{i}=scan(long_string,i,"\");
  end;
run;

View solution in original post

16 REPLIES 16
Astounding
PROC Star

I'm not sure this handles everything you see in your data, but the right tool to split up the text is the SCAN function:

 

long_string = "prexa\Medical Information\Medical";

col1 = scan(long_string, 1, '\');

col2 = scan(long_string, 2, '\');

col3 = scan(long_string, 3, '\');

vinayakraja
Fluorite | Level 6
Thanks a lot for your reply. I have used scan function but the data just not contain 3 substring but multiple substring( shared just snapshot of data in image) . I am dealing with 100,000+ observations so I need to have something that can be dynamic. I have been thinking of using do while loop while putting substring function having '\' something similar. Can you please suggest ? Thanks again.


vinayakraja
Fluorite | Level 6
Thanks a lot for your reply. I have used scan function but the data just not contain 3 substring but multiple substring( shared just snapshot of data in image) . I am dealing with 100,000+ observations so I need to have something that can be dynamic. I have been thinking of using do while loop while putting substring function having '\' something similar. Can you please suggest ? Thanks again.
ChrisNZ
Tourmaline | Level 20

Like this?

 

  array COL [99] $32;               
  NUM=input(LONG_STRING, ?? dollar32.);       
  if NUM=. then do I=1 to 99 until (COL[I]='');
    COL[I] = scan(LONG_STRING, I, '\');      
  end;

Edited: I had the informat name wrong !

vinayakraja
Fluorite | Level 6
thanks a lot. can you please help me understand. NUM=input(LONG_STRING, ?? $32.); I understand input function but can you please explain the role of "??". i have never used it.
vinayakraja
Fluorite | Level 6

thanks a lot. i will look into this. really Appericiate your help.

Astounding
PROC Star

The number of observations isn't so important.  It's the variables that matter.

 

You might consider skipping functions entirely and just using "\" as a delimiter when reading in the variables:

 

infile rawdata dlm="\" dsd;

input col1 $ col2 $ col3;

 

But you have to figure out where all the variables are.  It won't do to figure out 3 at a time.

vinayakraja
Fluorite | Level 6

its not raw data file but a coloum with a string. and its substring are seprated by '/'. thanks again for all your help.

Astounding
PROC Star

So given that you have a SAS data set with a single variable in it, what do you want the outcome to be at the end of all the processing?

ChrisNZ
Tourmaline | Level 20

And to populate the numeric variable you can add this:

  NUM=input(LONG_STRING, ?? dollar32.);
  if NUM=. then do;
    COL1 = scan(LONG_STRING, 1, '\');
    COL2 = scan(LONG_STRING, 2, '\');
    COL3 = scan(LONG_STRING, 3, '\');
  end;

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

 

Well the first question you should be asking yourself is what is this data, how did I arrive with it. Now I do not know what the numbers are but the other part looks like some sort of directory listing.  Is this data you have created?  If so I would fix the code which generates this data, if it comes from a vendor then update your agreement so the data is in a useable format.  Without knowledge of what the data is or how you came by it something like the below code might help, it creates a series of character variales for each split of the long string - note I do not do any conversions to number or anything else, just split the string up.  The reason is I do not know this data, nor what its used for.  99% of any programming task is to understand your data and arrange that in a manner which is conducive to programming, the example data you provided looks like at least two different data items appended to each other - this is generally not good.  Also, please post test data in the form of a datastep as text in your posts, its not great to have to type all that in to test code:

data have;
  infile datalines dlm="¬";
  length long_string $2000;
  input long_string $;
datalines;
6
7
3
4
P Toolbox\Customer program\peer to peer
abc\ert\popt\abcdedgeg\defty
;
run;

proc sql noprint;
  select  max(countw(LONG_STRING,"\"))
  into    :NUM_ARRAY
  from    HAVE;
quit;

data want;
  set have;
  array col{&num_array.} $100;
  do i=1 to countw(long_string,"\");
    col{i}=scan(long_string,i,"\");
  end;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 2580 views
  • 5 likes
  • 5 in conversation