Hello
I have a data set with 2 columns called ID and Vector.
Column "Vector" including concatenation of codes .
I want to create a new data set called wanted with a new column called "Ind" that classify each ID into one of two groups: Red or Black.
If field "Vector" contain at least one of the codes in macro variable black_list then Ind='Black'
else Ind='Green'
I also want to create new columns with binary values 1/0 of splitting column Vector.
The new columns names will be :Code_23, Code_48,Code_65 and so on.
The order of the columns will be by the numbers order (3,6,12,13 and so on)
What is the way to do it please?
Data have;
Input ID Vector $20.;
Cards;
1 23,48,65,19,82,17,15
2 16
3 98,76,13,19,25
4 87,12,52,76,24
5 3,28
6 6,16
7 19,82,25
8 28,98,26
9 76,43,25
10 43,65,87,98,12
;
Run;
%let black_list=25,19,18;
For the first, do this:
Data have;
Input ID Vector $20.;
Cards;
1 23,48,65,19,82,17,15
2 16
3 98,76,13,19,25
4 87,12,52,76,24
5 3,28
6 6,16
7 19,82,25
8 28,98,26
9 76,43,25
10 43,65,87,98,12
;
%let black_list=25,19,18;
data green_black;
set have;
ind = "green";
do i = 1 to countw("&black_list.",",");
if index(vector,scan("&black_list.",i,",")) then ind = "black";
end;
drop i;
run;
For the second, transpose to long, create a sorted template, and transpose back to wide:
data long;
set have;
length code $7;
value = 1;
do i = 1 to countw(vector,",");
code = "Code_" !! scan(vector,i,",");
output;
end;
drop i vector;
run;
proc sort data=long (keep=code) out=template nodup;
by code;
run;
data long2;
set
template
long
;
run;
proc transpose data=long out=wide (drop=_: where=(id ne .));
by id;
var value;
id code;
run;
data want;
set wide;
array nums _numeric_;
do over nums;
nums = coalesce(nums,0);
end;
run;
Consider the advantages of creating variables for all 99 codes. (You can adjust this if there are more than 99 possible codes.)
The programming will be easier. If you have multiple batches of data, they will all contain the same variables. And if you really want to eliminate codes which do not appear in the data, you can always do that later.
Here is the idea:
data want;
set have;
array codes {99} code_1 - code_99;
do _n_=1 to 99;
codes{_n_} = 0;
end;
do _n_=1 to countw(vector, ',');
codes{input(scan(vector, _n_, ','), 2.)} = 1;
end;
ind='green';
do _n_=&blacklist;
if codes{_n_}=1 then ind='black';
end;
run;
Now you're done ... both objectives ... unless you still want to remove codes that are always 0. If so, continue with:
proc summary data=want;
var code_1 - code_99;
output out=stats sum=;
run;
data _null_;
set stats;
length droplist $ 1000;
array codes {99} code_1-code_99;
do _n_=1 to 99;
if codes{_n_}=0 then droplist = catt (droplist, vname(codes{_n_}));
end;
call symputx('droplist', droplist);
end;
data really_want;
set want (drop=(&droplist));
run;
The code is all untested, so write back if you encounter any difficulty with testing and debugging.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.