BookmarkSubscribeRSS Feed
Ronein
Meteorite | Level 14

Hello

I have a data set with 2 columns called ID and Vector.

Column "Vector" including concatenation of codes .

I want to create a new data set called wanted with a new column called "Ind" that classify each ID into one of two groups: Red or Black.

If field "Vector" contain at least one of the codes in macro variable black_list then Ind='Black'

else Ind='Green'

I also want to create new columns with binary values 1/0 of splitting column Vector.

The new columns names will be :Code_23, Code_48,Code_65 and so on.

The order of the columns will be by the numbers order (3,6,12,13  and so on)

What is the way to do it please? 

 

 

Data have;
Input ID Vector $20.;
Cards;
1 23,48,65,19,82,17,15
2 16
3 98,76,13,19,25
4 87,12,52,76,24
5 3,28
6 6,16
7 19,82,25
8 28,98,26
9 76,43,25
10 43,65,87,98,12
;
Run;


%let black_list=25,19,18;
 

 

2 REPLIES 2
Kurt_Bremser
Super User

For the first, do this:

Data have;
Input ID Vector $20.;
Cards;
1 23,48,65,19,82,17,15
2 16
3 98,76,13,19,25
4 87,12,52,76,24
5 3,28
6 6,16
7 19,82,25
8 28,98,26
9 76,43,25
10 43,65,87,98,12
;

%let black_list=25,19,18;

data green_black;
set have;
ind = "green";
do i = 1 to countw("&black_list.",",");
  if index(vector,scan("&black_list.",i,",")) then ind = "black";
end;
drop i;
run;

For the second, transpose to long, create a sorted template, and transpose back to wide:

data long;
set have;
length code $7;
value = 1;
do i = 1 to countw(vector,",");
  code = "Code_" !! scan(vector,i,",");
  output;
end;
drop i vector;
run;

proc sort data=long (keep=code) out=template nodup;
by code;
run;

data long2;
set
  template
  long
;
run;

proc transpose data=long out=wide (drop=_: where=(id ne .));
by id;
var value;
id code;
run;

data want;
set wide;
array nums _numeric_;
do over nums;
  nums = coalesce(nums,0);
end;
run;
Astounding
PROC Star

Consider the advantages of creating variables for all 99 codes.  (You can adjust this if there are more than 99 possible codes.)

 

The programming will be easier.  If you have multiple batches of data, they will all contain the same variables.  And if you really want to eliminate codes which do not appear in the data, you can always do that later.

 

Here is the idea:

 

data want;
   set have;
   array codes {99} code_1 - code_99;
   do _n_=1 to 99;
       codes{_n_} = 0;
   end;
   do _n_=1 to countw(vector, ',');
      codes{input(scan(vector, _n_, ','), 2.)} = 1;
   end;
   ind='green';
   do _n_=&blacklist;
      if codes{_n_}=1 then ind='black';
   end;
run;

Now you're done ... both objectives ... unless you still want to remove codes that are always 0.  If so, continue with:

proc summary data=want;
   var code_1 - code_99;
   output out=stats sum=;
run;
data _null_;
   set stats;
   length droplist $ 1000;
   array codes {99} code_1-code_99;
   do _n_=1 to 99;
      if codes{_n_}=0 then droplist = catt (droplist, vname(codes{_n_}));
   end;
   call symputx('droplist', droplist);
end;
data really_want;
   set want (drop=(&droplist));
run;

The code is all untested, so write back if you encounter any difficulty with testing and debugging.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 390 views
  • 0 likes
  • 3 in conversation