Good afternoon All.
I have a table with a singular column which contains a variety of character strings delimited by a colon.
Example:
Data NewData;
length ReasonCode $255.;
Input REASONCODE $;
datalines;
aaa:bbb:ccc:ddd:eee
eee:aaa:fff:mmm
ddd:rrr:yyy:zzz
xxx:ggg
;
run;
No what I want to do is split out the string by the delimiter which I believe I had as per below. This creates Var1, 2, 3 etc from the string and those are populated with aaa, bbb, ccc etc
data NextStep ;
set NewData;
array var(40) $45.;
i=1;
do until (scan(ReasonCode,i,":") eq "");
var(i)=scan(ReasonCode,i,":");
i+1;
end;
run;
Thus creating the following dataset:
REASONCODE var1 var2 var3 var4 var5
aaa:bbb:ccc:ddd:eee aaa bbb ccc ddd eee
eee:aaa:fff:mmm eee aaa fff mmm
ddd:rrr:yyy:zzz ddd rrr yyy zzz
xxx:ggg xxx ggg
Now where I'm getting stuck on logic/coding is that I want to take every iteration of new variable such as aaa, bbb, ccc and make it a Column Heading and count the number of times this appears in the data?
REASONCODE aaa bbb ccc ddd eee ggg fff mmm rrr xxx yyy zzz
aaa:bbb:ccc:ddd:eee 1 1 1 1 1 0 0 0 0 0 0 0
eee:aaa:fff:mmm 1 0 0 0 1 0 1 1 0 0 0 0
ddd:rrr:yyy:zzz 0 0 0 1 0 0 0 0 1 0 1 1
xxx:ggg 0 0 0 0 0 1 0 0 0 1 0 0
Appreciate any help that you can provide.
Data NewData;
length ReasonCode $255.;
Input REASONCODE $;
datalines;
aaa:bbb:ccc:ddd:eee
eee:aaa:fff:mmm
ddd:rrr:yyy:zzz
xxx:ggg
;
run;
data temp;
set newdata;
do _n_=1 to countw(ReasonCode,':');
temp=scan(ReasonCode,_n_,':');
output;
end;
run;
proc freq data=temp ;
tables ReasonCode*temp/noprint sparse out=temp2(drop=percent);;
run;
proc transpose data=temp2 out=want(drop=_:);
by ReasonCode;
id temp;
var count;
run;
If the variable order is not important
Data NewData;
length ReasonCode $255.;
Input REASONCODE $;
datalines;
aaa:bbb:ccc:ddd:eee
eee:aaa:fff:mmm
ddd:rrr:yyy:zzz
xxx:ggg
;
run;
data temp;
set NewData;
retain c 1;
do i = 1 to countw(ReasonCode, ':');
w = scan(ReasonCode, i, ':');
output;
end;
run;
proc transpose data=temp out=want(drop=_:);
by ReasonCode notsorted;
id w;
var c;
run;
@Doyleuk1983 wrote:
If I were to add in further data as follows:
Data NewData;
length RSAP_ReasonCode $255.;
Input RSAP_REASONCODE $;
datalines;
aaa:bbb:ccc:ddd:eee
eee:aaa:fff:mmm
ddd:rrr:yyy:zzz
xxx:ggg
aaa:bbb:ccc:ddd:eee:fff
aaa:bbb:ccc:ddd:eee:ggg
aaa:bbb:ccc:ddd:eee:hhh
ddd:rrr:yyy:zzz
aaa:bbb:ccc:ddd:eee
eee:aaa:fff:mmm
aaa:bbb:ccc:ddd:eee:ggg
aaa:bbb:ccc:ddd:eee:ggg
;
run;
the solution doesn't work and produces:
ERROR: The ID value "aaa" occurs twice in the same BY group.
ERROR: The ID value "bbb" occurs twice in the same BY group.
ERROR: The ID value "ccc" occurs twice in the same BY group.
ERROR: The ID value "ddd" occurs twice in the same BY group.
ERROR: The ID value "eee" occurs twice in the same BY group.
ERROR: The ID value "ggg" occurs twice in the same BY group.
You cannot have duplicate ID for BY values unless you use the LET option on the procedure.
Not going to say that will fix everything but the errors will go away and you can examine the outputl
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.