I have a dataset which looks like this:
type (this is a character)
q_20211 (this is numeric)
q_20212(this is numeric)
q_20213(this is numeric)
last_code (this is numeric)
second_last_code (this is numeric)
XX-1
1
2
3
OO6;001
OO6
XX-2
1
1
2
K04;OO6
008
XX-3
1
3
1
002;001
009
XX-4
3
4
1
OO5;004
004;002
XX-5
4
2
1
001;K04
001
XX-6
4
3
1
008;001
002
XX-7
2
5
1
001;001
005
I want to create a new column named `code_checker` which checks the column `last_code`:
If `last_code` contains the value `k04` somewhere I want the value in `code_checker` to be `RED`
If `last_code` contains the value `002` somewhere I want the value in `code_checker` to be `GREEN`
PLEASTE NOTE: The values in `last_code` and `second_last_code` contain a `;` as delimiter, this is something I cannot and I don't want to change.
I tried to make above data reproduceable for you so you can test it but I could not use the `;` as delimiter inside `last_code` and `second_code`. Please keep that in mind, the real data contains `;` instead of `,`
data have;
infile datalines truncover;
input type $ q_20211 q_20212 q_20213 last_code $ second_last_code $;
datalines;
XX-1 1 2 3 oo6,001 oo6
XX-2 1 1 2 k04,oo6 008
XX-3 1 3 1 002,001 009
XX-4 3 4 1 oo5,004 004,002
XX-5 4 2 1 001,k04 001
XX-6 4 3 1 008,001 002
XX-7 2 5 3 001,001,004 005
My desired output:
type (this is a character)
q_20211 (this is numeric)
q_20212(this is numeric)
q_20213(this is numeric)
last_code (this is numeric)
second_last_code (this is numeric)
code_cheker
XX-1
1
2
3
OO6;001
OO6
XX-2
1
1
2
K04;OO6
008
RED
XX-3
1
3
1
002;001
009
GREEN
XX-4
3
4
1
OO5;004
004;002
XX-5
4
2
1
001;K04
001
RED
XX-6
4
3
1
008;001
002
XX-7
2
5
1
001;001
005
... View more