BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Uddin
Fluorite | Level 6

Hi,

Could you please help me regarding the above-mentioned issue. I have data set and I want to categorize the variable tectonic_setting into two new variables:
•Plate_type: Intrastate, Rift zone, Subsection zone
•Crust_type: Continental crust, Intermediate crust, Ocean crust

I used the following code:

data want;
set have;
Plate_type = SCAN(tectonic_setting,1);
Crust_type = SCAN(tectonic_setting,3);
run;

It works partially i.e. some part is coming and some part is not coming as a output.

Here is the some data:

Tectonic_Setting

Rift Zone / Continental Crust (>25 km)

Intrastate / Continental Crust (>25 km)

Subsection Zone / Continental Crust (>25 km)

Rift Zone / Intermediate Crust (15-25 km)

Rift Zone / Oceanic Crust (< 15 km)

 

I am also giving a portion of data set as an attachment

 

Looking for your kind response.

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Knowing delimiters to use with Scan likely is all you need:

 

Try this. The first data step is just to create example data.

data have;
   infile datalines truncover;
   input  Tectonic_Setting $50.;
datalines;
Rift Zone / Continental Crust (>25 km)
Intrastate / Continental Crust (>25 km)
Subsection Zone / Continental Crust (>25 km)
Rift Zone / Intermediate Crust (15-25 km)
Rift Zone / Oceanic Crust (< 15 km)
;

data want;
    set have;
    length plate_type $ 16 Crust_type $ 19;
    plate_type = scan(tectonic_setting,1,'/');
    crust_type = strip(scan(tectonic_setting,2,'/('));
run;

You didn't describe your problem but from the code you used all of the default delimiters would have been used so instead of "Rift Zone" you would only get "Rift". It is extremely likely, that depending on actual order of data that the length assigned to the Plate_type variable wasn't long enough to hold all of "interstate". So first we set a length for the desired variables long enough to hold the longest expected value (shown in your list).

Then we specify a specific character for the Scan function so that we get the whole value. The first scan uses just the / character but the second specifies use of / and ( so that the depths don't get included. I also use Strip function to remove the leading space that would occur.

 

One thing to learn that gets very important when manipulating character values is that if you do not set lengths of variables SAS will set them based on the first result or use. So "short" values typically indicate a length wasn't set.

 

 

 

View solution in original post

2 REPLIES 2
ballardw
Super User

Knowing delimiters to use with Scan likely is all you need:

 

Try this. The first data step is just to create example data.

data have;
   infile datalines truncover;
   input  Tectonic_Setting $50.;
datalines;
Rift Zone / Continental Crust (>25 km)
Intrastate / Continental Crust (>25 km)
Subsection Zone / Continental Crust (>25 km)
Rift Zone / Intermediate Crust (15-25 km)
Rift Zone / Oceanic Crust (< 15 km)
;

data want;
    set have;
    length plate_type $ 16 Crust_type $ 19;
    plate_type = scan(tectonic_setting,1,'/');
    crust_type = strip(scan(tectonic_setting,2,'/('));
run;

You didn't describe your problem but from the code you used all of the default delimiters would have been used so instead of "Rift Zone" you would only get "Rift". It is extremely likely, that depending on actual order of data that the length assigned to the Plate_type variable wasn't long enough to hold all of "interstate". So first we set a length for the desired variables long enough to hold the longest expected value (shown in your list).

Then we specify a specific character for the Scan function so that we get the whole value. The first scan uses just the / character but the second specifies use of / and ( so that the depths don't get included. I also use Strip function to remove the leading space that would occur.

 

One thing to learn that gets very important when manipulating character values is that if you do not set lengths of variables SAS will set them based on the first result or use. So "short" values typically indicate a length wasn't set.

 

 

 

Uddin
Fluorite | Level 6
Hi Ballardw,
Thanks for your kind support and explanation about the code usable for the data. Yes, the code you provided is working perfectly. As a new user, I am learning and enjoying this platform. Thanks again.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 396 views
  • 1 like
  • 2 in conversation