For example, I have a data set, with column name school subjects students are interested in. The old column would look like this:
School Subjects:
Health science and Math
Biology
Biology and Geology
Arts and Computer Science
Geology and Health Science
Math
My goal, is to extract the second school subject (ie Math or Geology) and put it in a new column to create the desired result:
School Subject 1 School Subject 2
Health Science Math
Biology
Biology Geology
Arts Computer Science
Geology Health Science
Math
My solution was to create 2 conversion types, based on "and" and use if and then statement to separate the subjects:
Conversiontype1 = scan(school_subjects, -2, " ");
Conversiontype2 = scan(school_subjects, 2, " ");
If conversiontype1 = "and" then school_subject1 = scan(school_subjects, -1, " ");
Else if conversiontype2 = "and" then school_subject2 = catx(" ", scan(school_subjects, -2), scan(school_subjects, -1));
else school_subject1 = school_subjects;
run;
Is there a simpler way of doing this?
One way. Note provided example data as a data step.
The findw function returns the position a word is found, or 0 if not, so you can adjust position to select characters by offsetting from that value when found. NOTE: you do not want the FIND function which would find the "and" as part of other words like "hand". If the spelling is inconsistent , i.e. sometimes 'and' or sometimes 'And' you can add a parameter to the findw to ignore case: Findw(source,'and',,'i')
data have; input Subject $40.; datalines; Health science and Math Biology Biology and Geology Arts and Computer Science Geology and Health Science Math ; data want; set have; length subject2 $ 40; if findw(subject,'and')>0 then do; subject2 = substr(subject,findw(subject,'and')+4); subject = substr(subject,1,findw(subject,'and')-1); end; run;
The Input was to have some data that code can be tested with.
You didn't even provide an actual workable variable name so I had to make something.
Did you run the code? Did it do what you requested with those values?
Use YOUR data set on the SET statement. Use your variable name instead of "subject".
Then see what happens.
@A-junamu_- wrote:
Yes it did, but my original question was looking for a simpler way, if you bothered to read it. What a useless response.
Define simpler.
Your solution uses two additional variables.
Mine uses no additional variables.
Yours fails if the second subject following "and" is more than two words, mine doesn't.
Your post title says "faster" not "simpler". I'm not going to bother to create data set large enough to get performance statistics for comparing "faster" or not. Because when I test your code snippet, against the data set I provide code for, your School_subject1 does not get assigned for the examples "Arts and Computer Science" and "Geology and Health Science". Your code:
Else if conversiontype2 = "and" then school_subject2 = catx(" ", scan(school_subjects, -2), scan(school_subjects, -1));
else school_subject1 = school_subjects;
fails to make an assignment to School_subject1 when the Conversiontype2= "and" is true. Just plain fails. So does not result in two variables each with one subject.
You did not provide full code for your data step so it may be that was accomplished elsewhere.
Or a basic 2 line solution into 2 new variables for the example data set.
data want; set have; sub1= scan(tranwrd(subject," and ",'+'),1,'+'); sub2= scan(tranwrd(subject," and ",'+'),2,'+'); run;
Please be very cautious about calling responses "useless" on this forum.
@A-junamu_- wrote:
Yes it did, but my original question was looking for a simpler way, if you bothered to read it. What a useless response.
That's not a great response when your code does not work, you don't understand the basic example data post, and you're asking for free help from perfect strangers in a new forum.
Why ? @ballardw has already given you the excellent solution .
If you really want take ' and ' a dlmstr,Here is the one :
data have; input Subject $40.; datalines; Health science and Math Biology Biology and Geology Arts and Computer Science Geology and Health Science Math ; filename x temp; data _null_; file x; set have; put subject; ; data want; infile x dlmstr=' and ' truncover; input subject1 : $100. subject2 : $100. ; run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.