IF char column contain

Ronein · Posted 08-05-2021 06:38 AM

Hello

I have a data set with 2 columns called ID and Vector.

Column "Vector" including concatenation of codes .

I want to create a new data set called wanted with a new column called "Ind" that classify each ID into one of two groups: Red or Black.

If field "Vector" contain at least one of the codes in macro variable black_list then Ind='Black'

else Ind='Green'

I also want to create new columns with binary values 1/0 of splitting column Vector.

The new columns names will be :Code_23, Code_48,Code_65 and so on.

The order of the columns will be by the numbers order (3,6,12,13 and so on)

What is the way to do it please?

Data have;
Input ID Vector $20.;
Cards;
1 23,48,65,19,82,17,15
2 16
3 98,76,13,19,25
4 87,12,52,76,24
5 3,28
6 6,16
7 19,82,25
8 28,98,26
9 76,43,25
10 43,65,87,98,12
;
Run;


%let black_list=25,19,18;

Kurt_Bremser · Posted 08-05-2021 07:56 AM

For the first, do this:

Data have;
Input ID Vector $20.;
Cards;
1 23,48,65,19,82,17,15
2 16
3 98,76,13,19,25
4 87,12,52,76,24
5 3,28
6 6,16
7 19,82,25
8 28,98,26
9 76,43,25
10 43,65,87,98,12
;

%let black_list=25,19,18;

data green_black;
set have;
ind = "green";
do i = 1 to countw("&black_list.",",");
  if index(vector,scan("&black_list.",i,",")) then ind = "black";
end;
drop i;
run;

For the second, transpose to long, create a sorted template, and transpose back to wide:

data long;
set have;
length code $7;
value = 1;
do i = 1 to countw(vector,",");
  code = "Code_" !! scan(vector,i,",");
  output;
end;
drop i vector;
run;

proc sort data=long (keep=code) out=template nodup;
by code;
run;

data long2;
set
  template
  long
;
run;

proc transpose data=long out=wide (drop=_: where=(id ne .));
by id;
var value;
id code;
run;

data want;
set wide;
array nums _numeric_;
do over nums;
  nums = coalesce(nums,0);
end;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Astounding · Posted 08-05-2021 09:51 AM

Consider the advantages of creating variables for all 99 codes. (You can adjust this if there are more than 99 possible codes.)

The programming will be easier. If you have multiple batches of data, they will all contain the same variables. And if you really want to eliminate codes which do not appear in the data, you can always do that later.

Here is the idea:

data want;
   set have;
   array codes {99} code_1 - code_99;
   do _n_=1 to 99;
       codes{_n_} = 0;
   end;
   do _n_=1 to countw(vector, ',');
      codes{input(scan(vector, _n_, ','), 2.)} = 1;
   end;
   ind='green';
   do _n_=&blacklist;
      if codes{_n_}=1 then ind='black';
   end;
run;

Now you're done ... both objectives ... unless you still want to remove codes that are always 0. If so, continue with:

proc summary data=want;
   var code_1 - code_99;
   output out=stats sum=;
run;
data _null_;
   set stats;
   length droplist $ 1000;
   array codes {99} code_1-code_99;
   do _n_=1 to 99;
      if codes{_n_}=0 then droplist = catt (droplist, vname(codes{_n_}));
   end;
   call symputx('droplist', droplist);
end;
data really_want;
   set want (drop=(&droplist));
run;

The code is all untested, so write back if you encounter any difficulty with testing and debugging.

IF char column contain

Re: IF char column contain

Re: IF char column contain

IF char column contain

Re: IF char column contain

Re: IF char column contain

Click image to register for webinar

Classroom Training Available!