BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Andalusia
Obsidian | Level 7

I have a dataset which looks like this:

 

type (this is a character) q_20211 (this is numeric) q_20212(this is numeric) q_20213(this is numeric) last_code (this is numeric) second_last_code (this is numeric)

XX-1

1 2 3 OO6;001 OO6
XX-2 1 1 2 K04;OO6 008
XX-3 1 3 1 002;001 009
XX-4 3 4 1 OO5;004 004;002
XX-5 4 2 1 001;K04 001
XX-6 4 3 1 008;001 002
XX-7 2 5 1 001;001 005

 

I want to create a new column named `code_checker` which  checks the column `last_code`:

  • If `last_code` contains the value `k04` somewhere I want the value in `code_checker` to be `RED`
  • If `last_code` contains the value `002`  somewhere I want the value in `code_checker` to be `GREEN`

PLEASTE NOTE: The values in `last_code` and `second_last_code` contain a `;` as delimiter, this is something I cannot and I don't want to change.

 

I tried to make above data reproduceable for you so you can test it but I could not use the `;` as delimiter inside `last_code` and `second_code`. Please keep that in mind, the real data contains `;` instead of `,`

data have;
infile datalines truncover;
input type $ q_20211 q_20212 q_20213 last_code $ second_last_code $;
datalines;
XX-1 1 2 3 oo6,001 oo6
XX-2 1 1 2 k04,oo6 008
XX-3 1 3 1 002,001 009
XX-4 3 4 1 oo5,004 004,002
XX-5 4 2 1 001,k04 001
XX-6 4 3 1 008,001 002
XX-7 2 5 3 001,001,004 005


My desired output:

type (this is a character) q_20211 (this is numeric) q_20212(this is numeric) q_20213(this is numeric) last_code (this is numeric) second_last_code (this is numeric) code_cheker

XX-1

1 2 3 OO6;001 OO6  
XX-2 1 1 2 K04;OO6 008 RED
XX-3 1 3 1 002;001 009 GREEN
XX-4 3 4 1 OO5;004 004;002  
XX-5 4 2 1 001;K04 001 RED
XX-6 4 3 1 008;001 002  
XX-7 2 5 1 001;001 005  

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

The presence of semicolons in the data can be alleviated by using the DATALINES4 statement and 4 consecutive semicolons to end the datalines block.

 

For finding substrings in delimited strings, use the FINDW function:

data have;
infile datalines truncover;
input type $ q_20211 q_20212 q_20213 last_code :$11. second_last_code $;
datalines4;
XX-1 1 2 3 oo6;001 oo6
XX-2 1 1 2 k04;oo6 008
XX-3 1 3 1 002;001 009
XX-4 3 4 1 oo5;004 004,002
XX-5 4 2 1 001;k04 001
XX-6 4 3 1 008;001 002
XX-7 2 5 3 001;001;004 005
;;;;

data want;
set have;
length code_checker $5;
if findw(last_code,'k04','; ') then code_checker = "RED";
if findw(last_code,'002','; ') then code_checker = "GREEN";
run;

View solution in original post

1 REPLY 1
Kurt_Bremser
Super User

The presence of semicolons in the data can be alleviated by using the DATALINES4 statement and 4 consecutive semicolons to end the datalines block.

 

For finding substrings in delimited strings, use the FINDW function:

data have;
infile datalines truncover;
input type $ q_20211 q_20212 q_20213 last_code :$11. second_last_code $;
datalines4;
XX-1 1 2 3 oo6;001 oo6
XX-2 1 1 2 k04;oo6 008
XX-3 1 3 1 002;001 009
XX-4 3 4 1 oo5;004 004,002
XX-5 4 2 1 001;k04 001
XX-6 4 3 1 008;001 002
XX-7 2 5 3 001;001;004 005
;;;;

data want;
set have;
length code_checker $5;
if findw(last_code,'k04','; ') then code_checker = "RED";
if findw(last_code,'002','; ') then code_checker = "GREEN";
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 595 views
  • 0 likes
  • 2 in conversation