BookmarkSubscribeRSS Feed
urban58
Quartz | Level 8

Hello,

 

I'm trying to categorize a variable that could have up to 8 different values, in some instances the more than 1 value has been saved for the same id.

Below is a sample data which works fine for mutually exclusive values but not when they have been mixed.

How do I code this to be able to get "More than one.." category?

Appreciate any help!

Margaret

 

proc format;
value fruit
1=1:Apple
2=2:Blueberry
3=3:More than one fruit;

data fruit;
input id fruit_code $;
datalines;
101 A
102 B
103 AB
104
;
data a;
set fruit;
if index(fruit_code, 'A') >0 then fruit=1;
else if index(fruit_code, 'B') >0 then fruit=2;
else if index(fruit_code, 'AB') >0 then fruit=3;
else if index(fruit_code, ' ') >0 then fruit=.;
run;

4 REPLIES 4
ChrisNZ
Tourmaline | Level 20

Unsure what the issue is. Like this?

if      index(fruit_code, 'AB') then fruit=3;

else if index(fruit_code, 'A' ) then fruit=1;
else if index(fruit_code, 'B' ) then fruit=2;
else                                 fruit=.;

 

 

ballardw
Super User

If your values actually are represented by exactly one character per "category" to get a "more than one" you can use the LENGTH function that would return the number of characters in the Fruit_code.

Instead of

else if index(fruit_code, 'AB') >0 then fruit=3;
else if length(fruit_code)>1 then fruit=3;

However this comparison should be the first with the type of code you show because if fruit_code = 'AB' then index(fruit_code, 'A') >0 is true as is index(fruit_code, 'B') >0

With Index function you would also have to test for 'BA' since you have not stated that is an impossible value.

 

If your fruit code values are actually something else then you need to provide actual values as this can get messy quickly.

 

 

And whose decision was it to put multiple values into a single variable and then compound the issue by not providing a delimiter between values?

urban58
Quartz | Level 8

Thank you both for your replies.

The data is of course much more complicated as ballardw astutely points out.

Here is a more realistic dataset


for this problem, the characters pertaining to fruit (A,B,C,N,P,R,Y,O and missing), the other characters (E,L) can be ignored (i.e. removed) for his problem

How do I adapt my code below to have another category fruit=9 (more than 1 fruit) with E,L removed

Many thanks!

Margaret

 

data fruit;

input id fruit_code $;
datalines;
101
102 A
103 B
104 BB
105 BEB
105 BEBN
106 BECBN
107 E
108 EC
109 ECBN
110 ECN
111 LL
112 LLR
113 LLC
114 C
115 Y
116 YA
117 YEA
118 YECA
119 LPR
120 O
;


data a;
set fruit;
if index(fruit_code, 'A') >0 then fruit=1;
else if index(fruit_code, 'B') >0 then fruit=2;
else if index(fruit_code, 'C') >0 then fruit=3;
else if index(fruit_code, 'N') >0 then fruit=4;
else if index(fruit_code, 'P') >0 then fruit=5;
else if index(fruit_code, 'R') >0 then fruit=6;
else if index(fruit_code, 'Y') >0 then fruit=7;
else if index(fruit_code, 'O') >0 then fruit=8;
else if index(fruit_code, ' ') >0 then fruit=.;
run;

ballardw
Super User

@urban58 wrote:

Thank you both for your replies.

The data is of course much more complicated as ballardw astutely points out.

Here is a more realistic dataset


for this problem, the characters pertaining to fruit (A,B,C,N,P,R,Y,O and missing), the other characters (E,L) can be ignored (i.e. removed) for his problem

How do I adapt my code below to have another category fruit=9 (more than 1 fruit) with E,L removed

Many thanks!

Margaret

 

data fruit;

input id fruit_code $;
datalines;
101
102 A
103 B
104 BB
105 BEB
105 BEBN
106 BECBN
107 E
108 EC
109 ECBN
110 ECN
111 LL
112 LLR
113 LLC
114 C
115 Y
116 YA
117 YEA
118 YECA
119 LPR
120 O
;


data a;
set fruit;
if index(fruit_code, 'A') >0 then fruit=1;
else if index(fruit_code, 'B') >0 then fruit=2;
else if index(fruit_code, 'C') >0 then fruit=3;
else if index(fruit_code, 'N') >0 then fruit=4;
else if index(fruit_code, 'P') >0 then fruit=5;
else if index(fruit_code, 'R') >0 then fruit=6;
else if index(fruit_code, 'Y') >0 then fruit=7;
else if index(fruit_code, 'O') >0 then fruit=8;
else if index(fruit_code, ' ') >0 then fruit=.;
run;


I would suggest you show us what your expected output is for some of those values. The reason I ask is because you have added a significant complication with the phrase " the other characters (E,L) can be ignored (i.e. removed) " and the duplication of codes within the string of codes. Consider is "BB" supposed to be a multiple or not?

 

If you do not want to consider E and L when present then you can use  compress(fruit_code,'EL') instead of fruit_code for the comparisons. The compress function removes the individual characters in the second parameter from the string.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1018 views
  • 0 likes
  • 3 in conversation