BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
emaneman
Pyrite | Level 9

Dear all,

I apologize for asking a similar question to one I asked sometime ago, but it does not work with string variables and that is what i need now.

 

I have a variable VAR which includes a series of words. I need to create a new variable multiple new variables which correspond to the frequencies of certain words in VAR.

 

In the example below, starting out with VAR, I need to create NEWRED, NEWBLUE and NEWYELLOW, which would have the values as indicated.

 

Example:

OBSVARNEWREDNEWBLUENEWYELLOW
1red blue red red blue yellow321
2red100
3blue yellow011

 

Thanks in advance for your attention.

 

E

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
data have;
input OBS	VAR & $50.	;*NEWRED	NEWBLUE	NEWYELLOW;
cards;
1	red blue red red blue yellow	3	2	1
2	red	 1	0	0
3	blue yellow	  0	1	1
;

data want;
 set have;
 NEWRED=count(var,'red');
 NEWBLUE=count(var,'blue');
 newyellow=count(var,'yellow');
run;

 

OBS VAR NEWRED NEWBLUE newyellow
1 red blue red red blue yellow 3 2 1
2 red 1 0 0
3 blue yellow 0 1 1

View solution in original post

10 REPLIES 10
novinosrin
Tourmaline | Level 20
data have;
input OBS	VAR & $50.	;*NEWRED	NEWBLUE	NEWYELLOW;
cards;
1	red blue red red blue yellow	3	2	1
2	red	 1	0	0
3	blue yellow	  0	1	1
;

data want;
 set have;
 NEWRED=count(var,'red');
 NEWBLUE=count(var,'blue');
 newyellow=count(var,'yellow');
run;

 

OBS VAR NEWRED NEWBLUE newyellow
1 red blue red red blue yellow 3 2 1
2 red 1 0 0
3 blue yellow 0 1 1
emaneman
Pyrite | Level 9

Thank you, this works well and it's efficient.

I should have, however, specified a bit further, since what I also need is a NEW variable that counts the occurrence of multiple colors.

See below variable called REDANDBLUE. Of course, i could calculate it in a further step, as a sum of NEWRED and NEWBLUE, but I wonder whether it could be done in one step. I tried  NEWRED=count(var,'red' 'blue') and I have tried NEWRED=count(var,'red', 'blue'), but neither works.

Again, sorry for not posting the whole thing at once.

e

 

OBSVARNEWREDNEWBLUENEWYELLOWREDANDBLUE
1red blue red red blue yellow3215
2red1001
3blue yellow0111
novinosrin
Tourmaline | Level 20

HI @emaneman  Add another assignment statement in the same datastep-

data want;
 set have;
 NEWRED=count(var,'red');
 NEWBLUE=count(var,'blue');
 newyellow=count(var,'yellow');
REDANDBLUE=sum(newred,newblue);
 
run;
Tom
Super User Tom
Super User

It looks like you just need to add.

REDANDBLUE=NEWRED+NEWBLUE;
andreas_lds
Jade | Level 19

@emaneman wrote:

Thank you, this works well and it's efficient.

I should have, however, specified a bit further, since what I also need is a NEW variable that counts the occurrence of multiple colors.

See below variable called REDANDBLUE. Of course, i could calculate it in a further step, as a sum of NEWRED and NEWBLUE, but I wonder whether it could be done in one step. I tried  NEWRED=count(var,'red' 'blue') and I have tried NEWRED=count(var,'red', 'blue'), but neither works.

Again, sorry for not posting the whole thing at once.

e

 

OBS VAR NEWRED NEWBLUE NEWYELLOW REDANDBLUE
1 red blue red red blue yellow 3 2 1 5
2 red 1 0 0 1
3 blue yellow 0 1 1 1

Why is "RedAndBlue" = 1 in the 2nd and 3rd obs, both don't have "RED BLUE" as value.

emaneman
Pyrite | Level 9

The varname is a bit misleading. I don't mean that both red and blue should appear for REDANDBLUE to have a value or 1 or more, but simply counting the number of either red or blue in VAR. As it has been pointed out, I can of course do the sum of the variables RED and BLUE, which I can create with the COUNT function, but I was wondering whether the COUNT function can count different occurrences (eg. red, blue) in one go. But it does not seem to be the case, since REDBLUE=count(var,'red', 'blue') does not work.

andreas_lds
Jade | Level 19

@emaneman wrote:

The varname is a bit misleading. I don't mean that both red and blue should appear for REDANDBLUE to have a value or 1 or more, but simply counting the number of either red or blue in VAR. As it has been pointed out, I can of course do the sum of the variables RED and BLUE, which I can create with the COUNT function, but I was wondering whether the COUNT function can count different occurrences (eg. red, blue) in one go. But it does not seem to be the case, since REDBLUE=count(var,'red', 'blue') does not work.


Well, have a look at the docs of the function "count" ... the function is just not working that way. Other have already posted a solution, just adding the counts for blue and red will solve the problem.

emaneman
Pyrite | Level 9
yes, I will go with that option.
Thank you.
PeterClemmensen
Tourmaline | Level 20

Try this

 

data have;
input VAR $50.;
datalines;	
red blue red red blue yellow 
red	                       
blue yellow	                 
;

data want;
   set have;
   newred    = count(VAR, 'red');
   newblue   = count(VAR, 'blue');
   newyellow = count(VAR, 'yellow');
run;
ballardw
Super User

When discussing about occurrence of one string within another you ALWAYS have to include whether this is supposed to include composite words that may contain your target values. So if you are interested in "red" do you count "redfish" or "toured" as an occurence?

 

I realize that your actual data is likely to be more complex and use different strings but the shorter your target and/or more common the letters in the target the more likely to have that target string appear in a composite word or part of another word.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 4110 views
  • 1 like
  • 6 in conversation