I have a dataset which looks like this:
| type (this is a character) | q_20211 (this is numeric) | q_20212(this is numeric) | q_20213(this is numeric) | last_code (this is numeric) | second_last_code (this is numeric) |
|
XX-1 |
1 | 2 | 3 | OO6;001 | OO6 |
| XX-2 | 1 | 1 | 2 | K04;OO6 | 008 |
| XX-3 | 1 | 3 | 1 | 002;001 | 009 |
| XX-4 | 3 | 4 | 1 | OO5;004 | 004;002 |
| XX-5 | 4 | 2 | 1 | 001;K04 | 001 |
| XX-6 | 4 | 3 | 1 | 008;001 | 002 |
| XX-7 | 2 | 5 | 1 | 001;001 | 005 |
I want to create a new column named `code_checker` which checks the column `last_code`:
PLEASTE NOTE: The values in `last_code` and `second_last_code` contain a `;` as delimiter, this is something I cannot and I don't want to change.
I tried to make above data reproduceable for you so you can test it but I could not use the `;` as delimiter inside `last_code` and `second_code`. Please keep that in mind, the real data contains `;` instead of `,`
data have;
infile datalines truncover;
input type $ q_20211 q_20212 q_20213 last_code $ second_last_code $;
datalines;
XX-1 1 2 3 oo6,001 oo6
XX-2 1 1 2 k04,oo6 008
XX-3 1 3 1 002,001 009
XX-4 3 4 1 oo5,004 004,002
XX-5 4 2 1 001,k04 001
XX-6 4 3 1 008,001 002
XX-7 2 5 3 001,001,004 005
My desired output:
| type (this is a character) | q_20211 (this is numeric) | q_20212(this is numeric) | q_20213(this is numeric) | last_code (this is numeric) | second_last_code (this is numeric) | code_cheker |
|
XX-1 |
1 | 2 | 3 | OO6;001 | OO6 | |
| XX-2 | 1 | 1 | 2 | K04;OO6 | 008 | RED |
| XX-3 | 1 | 3 | 1 | 002;001 | 009 | GREEN |
| XX-4 | 3 | 4 | 1 | OO5;004 | 004;002 | |
| XX-5 | 4 | 2 | 1 | 001;K04 | 001 | RED |
| XX-6 | 4 | 3 | 1 | 008;001 | 002 | |
| XX-7 | 2 | 5 | 1 | 001;001 | 005 |
The presence of semicolons in the data can be alleviated by using the DATALINES4 statement and 4 consecutive semicolons to end the datalines block.
For finding substrings in delimited strings, use the FINDW function:
data have;
infile datalines truncover;
input type $ q_20211 q_20212 q_20213 last_code :$11. second_last_code $;
datalines4;
XX-1 1 2 3 oo6;001 oo6
XX-2 1 1 2 k04;oo6 008
XX-3 1 3 1 002;001 009
XX-4 3 4 1 oo5;004 004,002
XX-5 4 2 1 001;k04 001
XX-6 4 3 1 008;001 002
XX-7 2 5 3 001;001;004 005
;;;;
data want;
set have;
length code_checker $5;
if findw(last_code,'k04','; ') then code_checker = "RED";
if findw(last_code,'002','; ') then code_checker = "GREEN";
run;
The presence of semicolons in the data can be alleviated by using the DATALINES4 statement and 4 consecutive semicolons to end the datalines block.
For finding substrings in delimited strings, use the FINDW function:
data have;
infile datalines truncover;
input type $ q_20211 q_20212 q_20213 last_code :$11. second_last_code $;
datalines4;
XX-1 1 2 3 oo6;001 oo6
XX-2 1 1 2 k04;oo6 008
XX-3 1 3 1 002;001 009
XX-4 3 4 1 oo5;004 004,002
XX-5 4 2 1 001;k04 001
XX-6 4 3 1 008;001 002
XX-7 2 5 3 001;001;004 005
;;;;
data want;
set have;
length code_checker $5;
if findw(last_code,'k04','; ') then code_checker = "RED";
if findw(last_code,'002','; ') then code_checker = "GREEN";
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.