BookmarkSubscribeRSS Feed
newsas007
Quartz | Level 8

Hi,

I have a dataset with categories in different order. I want to create a consistency in the order as  [(i)cyt, (ii) end therapy, (iii) targ]. Also delete the (i),(ii), and (iii).

Different versions are [(ii)cyt, (iii) targ, (ii) end therapy], [(iii)end therapy, (i) cyt, (iii) targ], etc. Please see the data below. Thanks

 

Data Have:

ID Categories
1 (i)cyt, (ii) end therapy, (iii) targ
2 (ii)cyt, (iii) targ, (ii) end therapy
3 (iii)end therapy, (i) cyt, (iii) targ
4 iii) targ, (i) cyt, (ii) end therapy
5 (iii)end therapy, (i) cyt, (iii) targ
6 iii) targ, (i) cyt, (ii) end therapy
7 (i)cyt, (ii) end therapy, (iii) targ
8 (ii)cyt, (iii) targ, (ii) end therapy
9 (ii)cyt
10 (iii) targ
11  (ii) end therapy

 

Data Want:

ID Categories Clean_Category
1 (i)cyt, (ii) end therapy, (iii) targ Cyt, End Therapy, Targ
2 (ii)cyt, (iii) targ, (ii) end therapy Cyt, End Therapy, Targ
3 (iii)end therapy, (i) cyt, (iii) targ Cyt, End Therapy, Targ
4 iii) targ, (i) cyt, (ii) end therapy Cyt, End Therapy, Targ
5 (iii)end therapy, (i) cyt, (iii) targ Cyt, End Therapy, Targ
6 iii) targ, (i) cyt, (ii) end therapy Cyt, End Therapy, Targ
7 (i)cyt, (ii) end therapy, (iii) targ Cyt, End Therapy, Targ
8 (ii)cyt, (iii) targ, (ii) end therapy Cyt, End Therapy, Targ
9 (ii)cyt Cyt
10 (iii) targ Targ
11  (ii) end therapy End Therapy

 

data have;
   infile datalines dlm=':';
   input ID Categories :$100.;
 datalines;
1:(i)cyt,(ii)end therapy,(iii)targ
2:(ii)cyt,(iii)targ,(ii)end therapy
3:(iii)end therapy,(i)cyt,(iii)targ
4:iii)targ,(i)cyt,(ii)end therapy
5:(iii)end therapy,(i)cyt,(iii)targ
6:iii)targ,(i)cyt,(ii)end therapy
7:(i)cyt,(ii)end therapy,(iii)targ
8:(ii)cyt,(iii)targ,(ii)end therapy
9:(ii)cyt
10:(iii)targ
11:(ii)end therapy
;
Run;
10 REPLIES 10
ballardw
Super User

@newsas007 wrote:

Hi,

I have a dataset with categories in different order. I want to create a consistency in the order as  [(i)cyt, (ii) end therapy, (iii) targ]. Also delete the (i),(ii), and (iii).

Different versions are [(ii)cyt, (iii) targ, (ii) end therapy], [(iii)end therapy, (i) cyt, (iii) targ], etc. Please see the data below. Thanks

 

Data Have:

ID Categories
1 (i)cyt, (ii) end therapy, (iii) targ
2 (ii)cyt, (iii) targ, (ii) end therapy
3 (iii)end therapy, (i) cyt, (iii) targ
4 iii) targ, (i) cyt, (ii) end therapy
5 (iii)end therapy, (i) cyt, (iii) targ
6 iii) targ, (i) cyt, (ii) end therapy
7 (i)cyt, (ii) end therapy, (iii) targ
8 (ii)cyt, (iii) targ, (ii) end therapy
9 (ii)cyt
10 (iii) targ
11  (ii) end therapy

 

Data Want:

ID Categories Clean_Category
1 (i)cyt, (ii) end therapy, (iii) targ Cyt, End Therapy, Targ
2 (ii)cyt, (iii) targ, (ii) end therapy Cyt, End Therapy, Targ
3 (iii)end therapy, (i) cyt, (iii) targ Cyt, End Therapy, Targ
4 iii) targ, (i) cyt, (ii) end therapy Cyt, End Therapy, Targ
5 (iii)end therapy, (i) cyt, (iii) targ Cyt, End Therapy, Targ
6 iii) targ, (i) cyt, (ii) end therapy Cyt, End Therapy, Targ
7 (i)cyt, (ii) end therapy, (iii) targ Cyt, End Therapy, Targ
8 (ii)cyt, (iii) targ, (ii) end therapy Cyt, End Therapy, Targ
9 (ii)cyt Cyt
10 (iii) targ Targ
11  (ii) end therapy End Therapy

I would start with something like this (untested code a I don't have a data set to work with)

data want;
   set have;
   length Clean_category $ 30;
   if index(categories,'cyt')>0 then Clean_category='CYT';
   if index(categories,'end therapy')>0 then Clean_category=catx(', ',Clean_category,'End Therapy');
   if index(categories,'targ')>0 then Clean_category=catx(', ',Clean_category,'Targ');
run;
newsas007
Quartz | Level 8

Hi @ballardw 

Thanks, I have added a code for the dataset. 

ballardw
Super User

@newsas007 wrote:

Hi @ballardw 

Thanks, I have added a code for the dataset. 


My example code works with the data from that data set you provided.

PhilC
Rhodochrosite | Level 12

This turns out to be an educational problem for myself.  There is a format for converting numbers to roman numerals, there is no such informat, switching back.  Without a way to convert roman numerals to numbers, how will the categories be sorted?  I haven't given up, but I did not read that this was mentioned, so now its mentioned.

 

but cty goes to (i) and targ to (iii) always, even if the numeral is wrong, so ok that means it doesn't matter. 

 

Still, I think I should know how to read in roman numerals from a text file.

ballardw
Super User

@PhilC wrote:

This turns out to be an educational problem for myself.  There is a format for converting numbers to roman numerals, there is no such informat, switching back.  Without a way to convert roman numerals to numbers, how will the categories be sorted?  I haven't given up, but I did not read that this was mentioned, so now its mentioned.

 

but cty goes to (i) and targ to (iii) always, even if the numeral is wrong, so ok that means it doesn't matter. 

 

Still, I think I should know how to read in roman numerals from a text file.


What numbers? Your example data set does not contain any numbers.

Are you saying that your example data is not actually what your data looks like? If that is case maybe time to use the Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

 

Does the code I provided not provide the output requested for the given input example data set?

What sorting, you did not state there was any sorting, your example says that the values were in a desired order if present. So the Roman numeral portion is completely dropped.

From your original post: "in the order as [(i)cyt, (ii) end therapy, (iii) targ]. Also delete the (i),(ii), and (iii)." 

If something is supposed to be a different order you need to provide input examples and the output.

If you are worried about the Roman numerals then you have to tell us how big of a Roman numeral is involved and exactly how the numeric value is to be used in the process because this is the first place you mention that has any impact.

 

PhilC
Rhodochrosite | Level 12
No, I'm just saying I tried to look and solve this problem, before reading
your discourse. I noticed the Roman numerals. At the time I was expecting
to have to sort the items based on the Roman numerals...
ballardw
Super User

@PhilC wrote:
No, I'm just saying I tried to look and solve this problem, before reading
your discourse. I noticed the Roman numerals. At the time I was expecting
to have to sort the items based on the Roman numerals...

Sorry, I thought that was the OP changing things on us.

 

You are describing an application of what my first statistics prof called "Juolas Metatherom" which states "You will seldom misunderstand a problem so as to make it easier." The professor's name was Dr. Juola ...

Kurt_Bremser
Super User

You can make use of the fact that your categories sort alphabetically:

data have;
   infile datalines dlm=':';
   input ID Categories :$100.;
 datalines;
1:(i)cyt,(ii)end therapy,(iii)targ
2:(ii)cyt,(iii)targ,(ii)end therapy
3:(iii)end therapy,(i)cyt,(iii)targ
4:iii)targ,(i)cyt,(ii)end therapy
5:(iii)end therapy,(i)cyt,(iii)targ
6:iii)targ,(i)cyt,(ii)end therapy
7:(i)cyt,(ii)end therapy,(iii)targ
8:(ii)cyt,(iii)targ,(ii)end therapy
9:(ii)cyt
10:(iii)targ
11:(ii)end therapy
;

data want1;
set have;
do i = 1 to countw(categories,",");
  category = strip(scan(scan(categories,i,","),2,")"));
  output;
end;
keep id category;
run;

proc sort data=want1;
by id category;
run;

data want2;
set want1;
by id;
retain categories;
length categories $100;
if first.id then categories = "";
categories = catx(",",categories,category);
if last.id;
keep id categories;
run;
PhilC
Rhodochrosite | Level 12

before getting distracted by the roman numerals, I intended to make use of the SORT FUNCTION since I don't get to use it very much.  Anyway, following my own "Juolas" Metatheory I did not want to assume the categories would be alphabetical.  BUT maybe we can, and we can use SORT.:

data have;
   infile datalines dlm=':';
   input ID Categories :$100.;
 datalines;
1:(i)cyt,(ii)end therapy,(iii)targ
2:(ii)cyt,(iii)targ,(ii)end therapy
3:(iii)end therapy,(i)cyt,(iii)targ
4:iii)targ,(i)cyt,(ii)end therapy
5:(iii)end therapy,(i)cyt,(iii)targ
6:iii)targ,(i)cyt,(ii)end therapy
7:(i)cyt,(ii)end therapy,(iii)targ
8:(ii)cyt,(iii)targ,(ii)end therapy
9:(ii)cyt
10:(iii)targ
11:(ii)end therapy
;

data want1;
  set have;
  length cat1-cat3 $20;
    array cat[3] $ cat1-cat3;
  do i = 1 to 3;/*assuming 3 categories*/
    cat[i] = strip(scan(scan(categories,i,","),2,")"));
  end;
  _rc=sort(of cat1-cat3);
  category=catx(", ",of cat1-cat3);
  keep id category;
run;
Ksharp
Super User
data have;
   infile datalines dlm=':';
   input ID Categories :$100.;
 datalines;
1:(i)cyt,(ii)end therapy,(iii)targ
2:(ii)cyt,(iii)targ,(ii)end therapy
3:(iii)end therapy,(i)cyt,(iii)targ
4:iii)targ,(i)cyt,(ii)end therapy
5:(iii)end therapy,(i)cyt,(iii)targ
6:iii)targ,(i)cyt,(ii)end therapy
7:(i)cyt,(ii)end therapy,(iii)targ
8:(ii)cyt,(iii)targ,(ii)end therapy
9:(ii)cyt
10:(iii)targ
11:(ii)end therapy
;

data want;
 set have;
 want=prxchange('s/\(.*?\)//',-1,Categories);
 want2=prxchange('s/.*\)//',-1,want);
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2634 views
  • 1 like
  • 5 in conversation