BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
JMagenta
Obsidian | Level 7

Hello,

 

I have a dataset here:

 

data readin; 
input ID Name Status; cards;
1 David Cool
1 David Cool
1 David Not Cool
2 Sam Warm
2 Sam Lukewarm
3 Bane Cool
4 Mary Warm
3 Bane Not Cool
5 Dane Cool
6 Jenny Cool
7 Ken Not Cool
8 Simran Warm
9 Priya Hot ; run;

 And for each group I want to select the first occurrence of STATUS where the value is "Cool" . I want ignore duplicate values of "Cool",  categorize the other values in the group with another variable, Bingo, as  "Other Statuses for this person".  I want to also categorize other groups without the value of "Cool" as "Other Statuses".  If possible I would like one data step or proc sql step for this.  I know I need to sort the data too. 

 

I want the data to look something like this

 

ID Name Status Bingo      
1 David Cool Cool
1 David

Cool

Other Status for this person
David

Not Cool

Other Status for this person
Sam

Warm

Other Statuses

 

 

How can achieve this? Any help with this is welcome and much appreciated!

 

Thank you,

J

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So this statement

Where upcase(Status) like '&Cool&';

Can never work.

If you convert STATUS to all uppercase then it could never have lower case letters o and l in it to match the condition.

 

The LOGIC is also wrong.

If you only process the records with COOL in them then the groups that do not have any such records will disappear. Use the IF statement, not a WHERE statement.

 

If you want to test if the letters COOL appear in a string you could use the INDEX() function or the FIND() function.  If you want to test if the word COOL appears in the string then use the INDEXW() or FINDW() function.

if index(lowcase(Status),'cool') then ....
if find(status,'cool','i') then ...
if indexw(lowcase(Status),'cool',' ') then ....
if findww(status,'cool',' ','i') then ...

 

But even that probably has a logic problem given your example data.  You had one record with STATUS='NOT COOL' which would match ALL of those tests, but looks to me like it means the opposite of COOL.

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

I am not sure I understand you description of the logic used. I think more details would be helpful. I think a complete list of the desired output, for all IDs would be helfpul; you only show outputs for IDs 1 and 2, I want to see the desired output for IDs 1 to 9.

 

Furthermore, your code does not work, see the screen capture below, and I request that you fix it.

 

PaigeMiller_0-1688999716004.png

 

--
Paige Miller
JMagenta
Obsidian | Level 7
Sorry for the confusion. I needed a solution not a troll. Thank you though.
Tom
Super User Tom
Super User

@JMagenta wrote:
Sorry for the confusion. I needed a solution not a troll. Thank you though.

Just explain what you want more clearly.    For example one question is what happens to the observations before the first observation that is COOL?

 

Providing working data steps will make it easier for others to help you.  It also make it clearer what your data actually looks like.

 

Providing the complete expected output (again as working data step) will make it easier to test whether suggested code could work.

JMagenta
Obsidian | Level 7

Hello,

 

I have a  data set, SAMPLE, here:

 

ID Name Status
1 David Cool
1 David

Cool

David

Not Cool

Sam

Warm

 

 

 

For each group I want to select the first occurrence of STATUS where the value is "Cool"  and set my new variable Bingo as "Cool Person".  I want categorize  duplicate values of  "Cool" for the variable STATUS,  as well as other values of STATUS  in the group with the variable, Bingo, as  "Other Statuses for this person".  I want to also categorize other groups without the value of "Cool"  for STATUS as "Other Statuses" for BINGO.  If possible I would like one data step or proc sql step for this.  I know I need to sort the data too. 

 

I want the data to look something like this

 

ID Name Status Bingo      
1 David Cool Cool Person
1 David

Cool

Other Status for this person
David

Not Cool

Other Status for this person
Sam

Warm

Other Statuses

 

 

How can achieve this? Any help with this is welcome and much appreciated!

 

Thank you,

J

Reeza
Super User

Based on your requirements this seems like it gets you close.

 

Reeza_0-1689008921202.png

 


@JMagenta wrote:

Hello,

 

I have a  data set, SAMPLE, here:

 

ID Name Status
1 David Cool
1 David

Cool

David

Not Cool

Sam

Warm

 

 

 

For each group I want to select the first occurrence of STATUS where the value is "Cool"  and set my new variable Bingo as "Cool Person".  I want categorize  duplicate values of  "Cool" for the variable STATUS,  as well as other values of STATUS  in the group with the variable, Bingo, as  "Other Statuses for this person".  I want to also categorize other groups without the value of "Cool"  for STATUS as "Other Statuses" for BINGO.  If possible I would like one data step or proc sql step for this.  I know I need to sort the data too. 

 

I want the data to look something like this

 

ID Name Status Bingo      
1 David Cool Cool Person
1 David

Cool

Other Status for this person
David

Not Cool

Other Status for this person
Sam

Warm

Other Statuses

 

 

How can achieve this? Any help with this is welcome and much appreciated!

 

Thank you,

J


 

Tom
Super User Tom
Super User

So reading between the lines it looks like ID is the variable that identifies the groups.  And it seems like you just want to set one of two values for the first observation of a group and then a constant value for all of the extra observations for the group. Also it seems from the values of this new variable that each group of observations represents a different person.

 

NOTE: "Cool Person" is not a valid variable name so let's just use NEW as the variable name.

data have ;
  input ID Name $ Status $10. ;
cards;
1 David Cool
1 David Cool
1 David Not Cool
2 Sam Warm
;

data want;
  set have ;
  by id;
  length NEW $30 ;
  if first.id then do;
    if status = 'Cool' then NEW = 'Cool Person';
    else NEW = 'Other statuses' ;
  end;
  else NEW = 'Other Status for this person' ;
run;

Result.

Obs    ID    Name     Status      NEW

 1      1    David    Cool        Cool Person
 2      1    David    Cool        Other Status for this person
 3      1    David    Not Cool    Other Status for this person
 4      2    Sam      Warm        Other statuses

If the situation is more complicated then perhaps you could provide some more observations that have examples of those complications.

JMagenta
Obsidian | Level 7

Thanks you,

 

I will add one caveat, there is a parsing where statement that describes what the first occurrence should equal. It also includes a like statement. So I tried this:

 

data have ;
  input ID Name $ Status $10. ;
cards;
1 David Cool
1 David Cool
1 David Not Cool
2 Sam Warm
;

data want;
  set have ;
  by id;
  length NEW $30 ;
Where upcase(Status) like '&Cool&';
  if first.id then do;
    NEW = 'Cool Person';
    else do NEW = 'Other statuses' ;
  end;
  else NEW = 'Other Status for this person' ;
run;

But it didn't work.

Tom
Super User Tom
Super User

So this statement

Where upcase(Status) like '&Cool&';

Can never work.

If you convert STATUS to all uppercase then it could never have lower case letters o and l in it to match the condition.

 

The LOGIC is also wrong.

If you only process the records with COOL in them then the groups that do not have any such records will disappear. Use the IF statement, not a WHERE statement.

 

If you want to test if the letters COOL appear in a string you could use the INDEX() function or the FIND() function.  If you want to test if the word COOL appears in the string then use the INDEXW() or FINDW() function.

if index(lowcase(Status),'cool') then ....
if find(status,'cool','i') then ...
if indexw(lowcase(Status),'cool',' ') then ....
if findww(status,'cool',' ','i') then ...

 

But even that probably has a logic problem given your example data.  You had one record with STATUS='NOT COOL' which would match ALL of those tests, but looks to me like it means the opposite of COOL.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2983 views
  • 3 likes
  • 4 in conversation