BookmarkSubscribeRSS Feed
Doyleuk1983
Fluorite | Level 6

Good afternoon All.

 

I have a table with a singular column which contains a variety of character strings delimited by a colon.

Example:

Data NewData;

length ReasonCode $255.;

Input REASONCODE $;

datalines;

aaa:bbb:ccc:ddd:eee

eee:aaa:fff:mmm

ddd:rrr:yyy:zzz

xxx:ggg

;

run;

 

No what I want to do is split out the string by the delimiter which I believe I had as per below.  This creates Var1, 2, 3 etc from the string and those are populated with aaa, bbb, ccc etc

data NextStep ;

set NewData;

array var(40) $45.;

i=1;

do until (scan(ReasonCode,i,":") eq "");

var(i)=scan(ReasonCode,i,":");

i+1;

end;

run;

 

Thus creating the following dataset:

REASONCODE          var1  var2  var3  var4  var5
aaa:bbb:ccc:ddd:eee aaa   bbb   ccc   ddd   eee
eee:aaa:fff:mmm     eee   aaa   fff   mmm 
ddd:rrr:yyy:zzz     ddd   rrr   yyy   zzz 
xxx:ggg             xxx   ggg

 

Now where I'm getting stuck on logic/coding is that I want to take every iteration of new variable such as aaa, bbb, ccc and make it a Column Heading and count the number of times this appears in the data?

 

REASONCODE          aaa bbb ccc ddd eee ggg fff mmm rrr xxx yyy zzz
aaa:bbb:ccc:ddd:eee   1   1   1   1   1   0   0   0   0   0   0   0
eee:aaa:fff:mmm       1   0   0   0   1   0   1   1   0   0   0   0
ddd:rrr:yyy:zzz       0   0   0   1   0   0   0   0   1   0   1   1
xxx:ggg               0   0   0   0   0   1   0   0   0   1   0   0

 

Appreciate any help that you can provide.

5 REPLIES 5
novinosrin
Tourmaline | Level 20

Data NewData;

length ReasonCode $255.;

Input REASONCODE $;

datalines;
aaa:bbb:ccc:ddd:eee
eee:aaa:fff:mmm
ddd:rrr:yyy:zzz
xxx:ggg
;

run;

data temp;
 set newdata;
 do _n_=1 to countw(ReasonCode,':');
  temp=scan(ReasonCode,_n_,':');
  output;
 end;
run;
proc freq data=temp ;
 tables ReasonCode*temp/noprint sparse out=temp2(drop=percent);;
run;

proc transpose data=temp2 out=want(drop=_:);
 by ReasonCode;
 id temp;
 var count;
run;
Doyleuk1983
Fluorite | Level 6
I added in more data and not sure the solution is summing up correctly;

Data NewData;
length RSAP_ReasonCode $255.;
Input RSAP_REASONCODE $;
datalines;
aaa:bbb:ccc:ddd:eee
eee:aaa:fff:mmm
ddd:rrr:yyy:zzz
xxx:ggg
aaa:bbb:ccc:ddd:eee:fff
aaa:bbb:ccc:ddd:eee:ggg
aaa:bbb:ccc:ddd:eee:hhh
ddd:rrr:yyy:zzz
aaa:bbb:ccc:ddd:eee
eee:aaa:fff:mmm
aaa:bbb:ccc:ddd:eee:ggg
aaa:bbb:ccc:ddd:eee:ggg
;
run;
PeterClemmensen
Tourmaline | Level 20

If the variable order is not important

 

Data NewData;
length ReasonCode $255.;
Input REASONCODE $;
datalines;
aaa:bbb:ccc:ddd:eee
eee:aaa:fff:mmm
ddd:rrr:yyy:zzz
xxx:ggg
;
run;

data temp;
   set NewData;
   retain c 1;
   do i = 1 to countw(ReasonCode, ':');
      w = scan(ReasonCode, i, ':');
      output;
   end;
run;

proc transpose data=temp out=want(drop=_:);
   by ReasonCode notsorted;
   id w;
   var c;
run;
Doyleuk1983
Fluorite | Level 6
If I were to add in further data as follows:
Data NewData;
length RSAP_ReasonCode $255.;
Input RSAP_REASONCODE $;
datalines;
aaa:bbb:ccc:ddd:eee
eee:aaa:fff:mmm
ddd:rrr:yyy:zzz
xxx:ggg
aaa:bbb:ccc:ddd:eee:fff
aaa:bbb:ccc:ddd:eee:ggg
aaa:bbb:ccc:ddd:eee:hhh
ddd:rrr:yyy:zzz
aaa:bbb:ccc:ddd:eee
eee:aaa:fff:mmm
aaa:bbb:ccc:ddd:eee:ggg
aaa:bbb:ccc:ddd:eee:ggg
;
run;

the solution doesn't work and produces:

ERROR: The ID value "aaa" occurs twice in the same BY group.
ERROR: The ID value "bbb" occurs twice in the same BY group.
ERROR: The ID value "ccc" occurs twice in the same BY group.
ERROR: The ID value "ddd" occurs twice in the same BY group.
ERROR: The ID value "eee" occurs twice in the same BY group.
ERROR: The ID value "ggg" occurs twice in the same BY group.
ballardw
Super User

@Doyleuk1983 wrote:
If I were to add in further data as follows:
Data NewData;
length RSAP_ReasonCode $255.;
Input RSAP_REASONCODE $;
datalines;
aaa:bbb:ccc:ddd:eee
eee:aaa:fff:mmm
ddd:rrr:yyy:zzz
xxx:ggg
aaa:bbb:ccc:ddd:eee:fff
aaa:bbb:ccc:ddd:eee:ggg
aaa:bbb:ccc:ddd:eee:hhh
ddd:rrr:yyy:zzz
aaa:bbb:ccc:ddd:eee
eee:aaa:fff:mmm
aaa:bbb:ccc:ddd:eee:ggg
aaa:bbb:ccc:ddd:eee:ggg
;
run;

the solution doesn't work and produces:

ERROR: The ID value "aaa" occurs twice in the same BY group.
ERROR: The ID value "bbb" occurs twice in the same BY group.
ERROR: The ID value "ccc" occurs twice in the same BY group.
ERROR: The ID value "ddd" occurs twice in the same BY group.
ERROR: The ID value "eee" occurs twice in the same BY group.
ERROR: The ID value "ggg" occurs twice in the same BY group.

 

You cannot have duplicate ID for BY values unless you use the LET option on the procedure.

Not going to say that will fix everything but the errors will go away and you can examine the outputl

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 2596 views
  • 2 likes
  • 4 in conversation