I have a dataset which looks like this:
type (this is a character) | q_20211 (this is numeric) | q_20212(this is numeric) | q_20213(this is numeric) | last_code (this is numeric) | second_last_code (this is numeric) |
XX-1 |
1 | 2 | 3 | OO6;001 | OO6 |
XX-2 | 1 | 1 | 2 | K04;OO6 | 008 |
XX-3 | 1 | 3 | 1 | 002;001 | 009 |
XX-4 | 3 | 4 | 1 | OO5;004 | 004;002 |
XX-5 | 4 | 2 | 1 | 001;K04 | 001 |
XX-6 | 4 | 3 | 1 | 008;001 | 002 |
XX-7 | 2 | 5 | 1 | 001;001 | 005 |
I want to create a new column named `code_checker` which checks the column `last_code`:
PLEASTE NOTE: The values in `last_code` and `second_last_code` contain a `;` as delimiter, this is something I cannot and I don't want to change.
I tried to make above data reproduceable for you so you can test it but I could not use the `;` as delimiter inside `last_code` and `second_code`. Please keep that in mind, the real data contains `;` instead of `,`
data have;
infile datalines truncover;
input type $ q_20211 q_20212 q_20213 last_code $ second_last_code $;
datalines;
XX-1 1 2 3 oo6,001 oo6
XX-2 1 1 2 k04,oo6 008
XX-3 1 3 1 002,001 009
XX-4 3 4 1 oo5,004 004,002
XX-5 4 2 1 001,k04 001
XX-6 4 3 1 008,001 002
XX-7 2 5 3 001,001,004 005
My desired output:
type (this is a character) | q_20211 (this is numeric) | q_20212(this is numeric) | q_20213(this is numeric) | last_code (this is numeric) | second_last_code (this is numeric) | code_cheker |
XX-1 |
1 | 2 | 3 | OO6;001 | OO6 | |
XX-2 | 1 | 1 | 2 | K04;OO6 | 008 | RED |
XX-3 | 1 | 3 | 1 | 002;001 | 009 | GREEN |
XX-4 | 3 | 4 | 1 | OO5;004 | 004;002 | |
XX-5 | 4 | 2 | 1 | 001;K04 | 001 | RED |
XX-6 | 4 | 3 | 1 | 008;001 | 002 | |
XX-7 | 2 | 5 | 1 | 001;001 | 005 |
The presence of semicolons in the data can be alleviated by using the DATALINES4 statement and 4 consecutive semicolons to end the datalines block.
For finding substrings in delimited strings, use the FINDW function:
data have;
infile datalines truncover;
input type $ q_20211 q_20212 q_20213 last_code :$11. second_last_code $;
datalines4;
XX-1 1 2 3 oo6;001 oo6
XX-2 1 1 2 k04;oo6 008
XX-3 1 3 1 002;001 009
XX-4 3 4 1 oo5;004 004,002
XX-5 4 2 1 001;k04 001
XX-6 4 3 1 008;001 002
XX-7 2 5 3 001;001;004 005
;;;;
data want;
set have;
length code_checker $5;
if findw(last_code,'k04','; ') then code_checker = "RED";
if findw(last_code,'002','; ') then code_checker = "GREEN";
run;
The presence of semicolons in the data can be alleviated by using the DATALINES4 statement and 4 consecutive semicolons to end the datalines block.
For finding substrings in delimited strings, use the FINDW function:
data have;
infile datalines truncover;
input type $ q_20211 q_20212 q_20213 last_code :$11. second_last_code $;
datalines4;
XX-1 1 2 3 oo6;001 oo6
XX-2 1 1 2 k04;oo6 008
XX-3 1 3 1 002;001 009
XX-4 3 4 1 oo5;004 004,002
XX-5 4 2 1 001;k04 001
XX-6 4 3 1 008;001 002
XX-7 2 5 3 001;001;004 005
;;;;
data want;
set have;
length code_checker $5;
if findw(last_code,'k04','; ') then code_checker = "RED";
if findw(last_code,'002','; ') then code_checker = "GREEN";
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.