Solved: Create a new table based on an older one

Andalusia · Posted 10-04-2021 03:51 AM

I have a dataset which looks like this:

type (this is a character)	q_20211 (this is numeric)	q_20212(this is numeric)	q_20213(this is numeric)	last_code (this is numeric)	second_last_code (this is numeric)
XX-1	1	2	3	OO6;001	OO6
XX-2	1	1	2	K04;OO6	008
XX-3	1	3	1	002;001	009
XX-4	3	4	1	OO5;004	004;002
XX-5	4	2	1	001;K04	001
XX-6	4	3	1	008;001	002
XX-7	2	5	1	001;001	005

I want to create a new column named `code_checker` which checks the column `last_code`:

If `last_code` contains the value `k04` somewhere I want the value in `code_checker` to be `RED`
If `last_code` contains the value `002` somewhere I want the value in `code_checker` to be `GREEN`

PLEASTE NOTE: The values in `last_code` and `second_last_code` contain a `;` as delimiter, this is something I cannot and I don't want to change.

I tried to make above data reproduceable for you so you can test it but I could not use the `;` as delimiter inside `last_code` and `second_code`. Please keep that in mind, the real data contains `;` instead of `,`

data have;
infile datalines truncover;
input type $ q_20211 q_20212 q_20213 last_code $ second_last_code $;
datalines;
XX-1 1 2 3 oo6,001 oo6
XX-2 1 1 2 k04,oo6 008
XX-3 1 3 1 002,001 009
XX-4 3 4 1 oo5,004 004,002
XX-5 4 2 1 001,k04 001
XX-6 4 3 1 008,001 002
XX-7 2 5 3 001,001,004 005

My desired output:

type (this is a character)	q_20211 (this is numeric)	q_20212(this is numeric)	q_20213(this is numeric)	last_code (this is numeric)	second_last_code (this is numeric)	code_cheker
XX-1	1	2	3	OO6;001	OO6
XX-2	1	1	2	K04;OO6	008	RED
XX-3	1	3	1	002;001	009	GREEN
XX-4	3	4	1	OO5;004	004;002
XX-5	4	2	1	001;K04	001	RED
XX-6	4	3	1	008;001	002
XX-7	2	5	1	001;001	005

Kurt_Bremser · Posted 10-04-2021 04:05 AM

The presence of semicolons in the data can be alleviated by using the DATALINES4 statement and 4 consecutive semicolons to end the datalines block.

For finding substrings in delimited strings, use the FINDW function:

data have;
infile datalines truncover;
input type $ q_20211 q_20212 q_20213 last_code :$11. second_last_code $;
datalines4;
XX-1 1 2 3 oo6;001 oo6
XX-2 1 1 2 k04;oo6 008
XX-3 1 3 1 002;001 009
XX-4 3 4 1 oo5;004 004,002
XX-5 4 2 1 001;k04 001
XX-6 4 3 1 008;001 002
XX-7 2 5 3 001;001;004 005
;;;;

data want;
set have;
length code_checker $5;
if findw(last_code,'k04','; ') then code_checker = "RED";
if findw(last_code,'002','; ') then code_checker = "GREEN";
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 10-04-2021 04:05 AM

The presence of semicolons in the data can be alleviated by using the DATALINES4 statement and 4 consecutive semicolons to end the datalines block.

For finding substrings in delimited strings, use the FINDW function:

data have;
infile datalines truncover;
input type $ q_20211 q_20212 q_20213 last_code :$11. second_last_code $;
datalines4;
XX-1 1 2 3 oo6;001 oo6
XX-2 1 1 2 k04;oo6 008
XX-3 1 3 1 002;001 009
XX-4 3 4 1 oo5;004 004,002
XX-5 4 2 1 001;k04 001
XX-6 4 3 1 008;001 002
XX-7 2 5 3 001;001;004 005
;;;;

data want;
set have;
length code_checker $5;
if findw(last_code,'k04','; ') then code_checker = "RED";
if findw(last_code,'002','; ') then code_checker = "GREEN";
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Create a new table based on an older one

Re: Create a new table based on an older one

Re: Create a new table based on an older one

Create a new table based on an older one

Re: Create a new table based on an older one

Re: Create a new table based on an older one

Registration is open

SAS Training: Just a Click Away