I have a dataset which looks like this:
type (this is a character) | q_20211 (this is numeric) | q_20212(this is numeric) | q_20213(this is numeric) | last_code (this is numeric) | second_last_code (this is numeric) |
XX-1 |
1 | 2 | 3 | OO6;001 | OO6 |
XX-2 | 1 | 1 | 2 | K04;OO6 | 008 |
XX-3 | 1 | 3 | 1 | 002;001 | 009 |
XX-4 | 3 | 4 | 1 | OO5;004 | 004;002 |
XX-5 | 4 | 2 | 1 | 001;K04 | 001 |
XX-6 | 4 | 3 | 1 | 008;001 | 002 |
XX-7 | 2 | 5 | 1 | 001;001 | 005 |
I want to create a new column named `code_checker` which checks the column `last_code`:
PLEASTE NOTE: The values in `last_code` and `second_last_code` contain a `;` as delimiter, this is something I cannot and I don't want to change.
I tried to make above data reproduceable for you so you can test it but I could not use the `;` as delimiter inside `last_code` and `second_code`. Please keep that in mind, the real data contains `;` instead of `,`
data have;
infile datalines truncover;
input type $ q_20211 q_20212 q_20213 last_code $ second_last_code $;
datalines;
XX-1 1 2 3 oo6,001 oo6
XX-2 1 1 2 k04,oo6 008
XX-3 1 3 1 002,001 009
XX-4 3 4 1 oo5,004 004,002
XX-5 4 2 1 001,k04 001
XX-6 4 3 1 008,001 002
XX-7 2 5 3 001,001,004 005
My desired output:
type (this is a character) | q_20211 (this is numeric) | q_20212(this is numeric) | q_20213(this is numeric) | last_code (this is numeric) | second_last_code (this is numeric) | code_cheker |
XX-1 |
1 | 2 | 3 | OO6;001 | OO6 | |
XX-2 | 1 | 1 | 2 | K04;OO6 | 008 | RED |
XX-3 | 1 | 3 | 1 | 002;001 | 009 | GREEN |
XX-4 | 3 | 4 | 1 | OO5;004 | 004;002 | |
XX-5 | 4 | 2 | 1 | 001;K04 | 001 | RED |
XX-6 | 4 | 3 | 1 | 008;001 | 002 | |
XX-7 | 2 | 5 | 1 | 001;001 | 005 |
The presence of semicolons in the data can be alleviated by using the DATALINES4 statement and 4 consecutive semicolons to end the datalines block.
For finding substrings in delimited strings, use the FINDW function:
data have;
infile datalines truncover;
input type $ q_20211 q_20212 q_20213 last_code :$11. second_last_code $;
datalines4;
XX-1 1 2 3 oo6;001 oo6
XX-2 1 1 2 k04;oo6 008
XX-3 1 3 1 002;001 009
XX-4 3 4 1 oo5;004 004,002
XX-5 4 2 1 001;k04 001
XX-6 4 3 1 008;001 002
XX-7 2 5 3 001;001;004 005
;;;;
data want;
set have;
length code_checker $5;
if findw(last_code,'k04','; ') then code_checker = "RED";
if findw(last_code,'002','; ') then code_checker = "GREEN";
run;
The presence of semicolons in the data can be alleviated by using the DATALINES4 statement and 4 consecutive semicolons to end the datalines block.
For finding substrings in delimited strings, use the FINDW function:
data have;
infile datalines truncover;
input type $ q_20211 q_20212 q_20213 last_code :$11. second_last_code $;
datalines4;
XX-1 1 2 3 oo6;001 oo6
XX-2 1 1 2 k04;oo6 008
XX-3 1 3 1 002;001 009
XX-4 3 4 1 oo5;004 004,002
XX-5 4 2 1 001;k04 001
XX-6 4 3 1 008;001 002
XX-7 2 5 3 001;001;004 005
;;;;
data want;
set have;
length code_checker $5;
if findw(last_code,'k04','; ') then code_checker = "RED";
if findw(last_code,'002','; ') then code_checker = "GREEN";
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.