BookmarkSubscribeRSS Feed
A-junamu_-
Fluorite | Level 6

For example, I have a data set, with column name school subjects students are interested in. The old column would look like this:

 

School Subjects:

Health science and Math

Biology

Biology and Geology

Arts and Computer Science

Geology and Health Science

Math

 

My goal, is to extract the second school subject (ie Math or Geology) and put it in a new column to create the desired result:

 

School Subject 1                 School Subject 2

Health Science                      Math

Biology

Biology                                  Geology

Arts                                       Computer Science

Geology                                Health Science

Math

 

My solution was to create 2 conversion types, based on "and" and use if and then statement to separate the subjects:

Conversiontype1 = scan(school_subjects, -2, " ");

Conversiontype2 = scan(school_subjects, 2, " ");

If conversiontype1 = "and" then school_subject1 = scan(school_subjects, -1, " ");

Else if conversiontype2 = "and" then school_subject2 = catx(" ", scan(school_subjects, -2), scan(school_subjects, -1));

else school_subject1 = school_subjects;

run; 

 

Is there a simpler way of doing this?

7 REPLIES 7
ballardw
Super User

One way. Note provided example data as a data step.

The findw function returns the position a word is found, or 0 if not, so you can adjust position to select characters by offsetting from that value when found. NOTE: you do not want the FIND function which would find the "and" as part of other words like "hand". If the spelling is inconsistent , i.e. sometimes 'and' or sometimes 'And' you can add a parameter to the findw to ignore case: Findw(source,'and',,'i')

data have;
  input Subject $40.;
datalines;
Health science and Math
Biology
Biology and Geology
Arts and Computer Science
Geology and Health Science
Math
;

data want;
   set have;
   length subject2 $ 40;
   if findw(subject,'and')>0 then do;
      subject2 = substr(subject,findw(subject,'and')+4);
      subject   = substr(subject,1,findw(subject,'and')-1);
      
   end;
run;
A-junamu_-
Fluorite | Level 6
The problem is that it's a large data set, so I cannot use input. I need a code that can use "and" as a delimiter to separate the subjects.
ballardw
Super User

The Input was to have some data that code can be tested with.

You didn't even provide an actual workable variable name so I had to make something.

 

Did you run the code? Did it do what you requested with  those values?

 

Use YOUR data set on the SET statement. Use your variable name instead of "subject".

Then see what happens.

A-junamu_-
Fluorite | Level 6
Yes it did, but my original question was looking for a simpler way, if you bothered to read it. What a useless response.
ballardw
Super User

@A-junamu_- wrote:
Yes it did, but my original question was looking for a simpler way, if you bothered to read it. What a useless response.

Define simpler.

Your solution uses two additional variables.

Mine uses no additional variables.

 

Yours fails if the second subject following "and" is more than two words, mine doesn't.

 

Your post title says "faster" not "simpler". I'm not going to bother to create data set large enough to get performance statistics for comparing "faster" or not. Because when I test your code snippet, against the data set I provide code for, your School_subject1 does not get assigned for the examples "Arts and Computer Science" and "Geology and Health Science". Your code:

Else if conversiontype2 = "and" then school_subject2 = catx(" ", scan(school_subjects, -2), scan(school_subjects, -1));

else school_subject1 = school_subjects;

fails to make an assignment to School_subject1 when the Conversiontype2= "and" is true. Just plain fails. So does not result in two variables each with one subject.

 

You did not provide full code for your data step so it may be that was accomplished elsewhere. 

Or a basic 2 line solution into 2 new variables for the example data set.

data want;
   set have;
   sub1= scan(tranwrd(subject," and ",'+'),1,'+');
   sub2= scan(tranwrd(subject," and ",'+'),2,'+');
run; 

 

Please be very cautious about calling responses "useless" on this forum.

Reeza
Super User

@A-junamu_- wrote:
Yes it did, but my original question was looking for a simpler way, if you bothered to read it. What a useless response.

That's not a great response when your code does not work, you don't understand the basic example data post, and you're asking for free help from perfect strangers in a new forum. 

 

Ksharp
Super User

Why ? @ballardw  has already given you the excellent solution .

If you really want take ' and ' a dlmstr,Here is the one :

 

data have;
  input Subject $40.;
datalines;
Health science and Math
Biology
Biology and Geology
Arts and Computer Science
Geology and Health Science
Math
;

filename x temp;
data _null_;
file x;
set have;
put subject;
;

data want;
infile x dlmstr=' and ' truncover;
input subject1 : $100. subject2 : $100. ;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 892 views
  • 4 likes
  • 4 in conversation