Solved: How to select first occurrence, ignore duplicates, categorize groups w...

JMagenta · Posted 07-10-2023 10:27 AM

Hello,

I have a dataset here:

data readin; 
input ID Name Status; cards; 
1 David Cool 
1 David Cool 
1 David Not Cool 
2 Sam Warm 
2 Sam Lukewarm 
3 Bane Cool 
4 Mary Warm 
3 Bane Not Cool 
5 Dane Cool 
6 Jenny Cool 
7 Ken Not Cool
8 Simran Warm 
9 Priya Hot ; run;

And for each group I want to select the first occurrence of STATUS where the value is "Cool" . I want ignore duplicate values of "Cool", categorize the other values in the group with another variable, Bingo, as "Other Statuses for this person". I want to also categorize other groups without the value of "Cool" as "Other Statuses". If possible I would like one data step or proc sql step for this. I know I need to sort the data too.

I want the data to look something like this

ID	Name	Status	Bingo
1	David	Cool	Cool
1	David	Cool	Other Status for this person
1	David	Not Cool	Other Status for this person
2	Sam	Warm	Other Statuses

How can achieve this? Any help with this is welcome and much appreciated!

Thank you,

J

Tom · Posted 07-10-2023 02:13 PM

So this statement

Where upcase(Status) like '&Cool&';

Can never work.

If you convert STATUS to all uppercase then it could never have lower case letters o and l in it to match the condition.

The LOGIC is also wrong.

If you only process the records with COOL in them then the groups that do not have any such records will disappear. Use the IF statement, not a WHERE statement.

If you want to test if the letters COOL appear in a string you could use the INDEX() function or the FIND() function. If you want to test if the word COOL appears in the string then use the INDEXW() or FINDW() function.

if index(lowcase(Status),'cool') then ....
if find(status,'cool','i') then ...
if indexw(lowcase(Status),'cool',' ') then ....
if findww(status,'cool',' ','i') then ...

But even that probably has a logic problem given your example data. You had one record with STATUS='NOT COOL' which would match ALL of those tests, but looks to me like it means the opposite of COOL.

View solution in original post

PaigeMiller · Posted 07-10-2023 10:36 AM

I am not sure I understand you description of the logic used. I think more details would be helpful. I think a complete list of the desired output, for all IDs would be helfpul; you only show outputs for IDs 1 and 2, I want to see the desired output for IDs 1 to 9.

Furthermore, your code does not work, see the screen capture below, and I request that you fix it.

--
Paige Miller

JMagenta · Posted 07-10-2023 12:24 PM

Sorry for the confusion. I needed a solution not a troll. Thank you though.

Tom · Posted 07-10-2023 12:38 PM

@JMagenta wrote:
Sorry for the confusion. I needed a solution not a troll. Thank you though.

Just explain what you want more clearly. For example one question is what happens to the observations before the first observation that is COOL?

Providing working data steps will make it easier for others to help you. It also make it clearer what your data actually looks like.

Providing the complete expected output (again as working data step) will make it easier to test whether suggested code could work.

JMagenta · Posted 07-10-2023 12:37 PM

Hello,

I have a data set, SAMPLE, here:

ID	Name	Status
1	David	Cool
1	David	Cool
1	David	Not Cool
2	Sam	Warm

For each group I want to select the first occurrence of STATUS where the value is "Cool" and set my new variable Bingo as "Cool Person". I want categorize duplicate values of "Cool" for the variable STATUS, as well as other values of STATUS in the group with the variable, Bingo, as "Other Statuses for this person". I want to also categorize other groups without the value of "Cool" for STATUS as "Other Statuses" for BINGO. If possible I would like one data step or proc sql step for this. I know I need to sort the data too.

I want the data to look something like this

ID	Name	Status	Bingo
1	David	Cool	Cool Person
1	David	Cool	Other Status for this person
1	David	Not Cool	Other Status for this person
2	Sam	Warm	Other Statuses

How can achieve this? Any help with this is welcome and much appreciated!

Thank you,

J

Quick Reply

Reeza · Posted 07-10-2023 01:09 PM

Based on your requirements this seems like it gets you close.

@JMagenta wrote:

Hello,

I have a data set, SAMPLE, here:

ID Name Status

1 David Cool

1 David
Cool

1 David
Not Cool

2 Sam
Warm

For each group I want to select the first occurrence of STATUS where the value is "Cool" and set my new variable Bingo as "Cool Person". I want categorize duplicate values of "Cool" for the variable STATUS, as well as other values of STATUS in the group with the variable, Bingo, as "Other Statuses for this person". I want to also categorize other groups without the value of "Cool" for STATUS as "Other Statuses" for BINGO. If possible I would like one data step or proc sql step for this. I know I need to sort the data too.

I want the data to look something like this

ID Name Status Bingo

1 David Cool Cool Person

1 David
Cool
Other Status for this person

1 David
Not Cool
Other Status for this person

2 Sam
Warm
Other Statuses

How can achieve this? Any help with this is welcome and much appreciated!

Thank you,

J

Tom · Posted 07-10-2023 01:32 PM

So reading between the lines it looks like ID is the variable that identifies the groups. And it seems like you just want to set one of two values for the first observation of a group and then a constant value for all of the extra observations for the group. Also it seems from the values of this new variable that each group of observations represents a different person.

NOTE: "Cool Person" is not a valid variable name so let's just use NEW as the variable name.

data have ;
  input ID Name $ Status $10. ;
cards;
1 David Cool
1 David Cool
1 David Not Cool
2 Sam Warm
;

data want;
  set have ;
  by id;
  length NEW $30 ;
  if first.id then do;
    if status = 'Cool' then NEW = 'Cool Person';
    else NEW = 'Other statuses' ;
  end;
  else NEW = 'Other Status for this person' ;
run;

Result.

Obs    ID    Name     Status      NEW

 1      1    David    Cool        Cool Person
 2      1    David    Cool        Other Status for this person
 3      1    David    Not Cool    Other Status for this person
 4      2    Sam      Warm        Other statuses

If the situation is more complicated then perhaps you could provide some more observations that have examples of those complications.

JMagenta · Posted 07-10-2023 02:03 PM

Thanks you,

I will add one caveat, there is a parsing where statement that describes what the first occurrence should equal. It also includes a like statement. So I tried this:

data have ;
  input ID Name $ Status $10. ;
cards;
1 David Cool
1 David Cool
1 David Not Cool
2 Sam Warm
;

data want;
  set have ;
  by id;
  length NEW $30 ;
Where upcase(Status) like '&Cool&';
  if first.id then do;
    NEW = 'Cool Person';
    else do NEW = 'Other statuses' ;
  end;
  else NEW = 'Other Status for this person' ;
run;

But it didn't work.

Tom · Posted 07-10-2023 02:13 PM

So this statement

Where upcase(Status) like '&Cool&';

Can never work.

If you convert STATUS to all uppercase then it could never have lower case letters o and l in it to match the condition.

The LOGIC is also wrong.

If you only process the records with COOL in them then the groups that do not have any such records will disappear. Use the IF statement, not a WHERE statement.

If you want to test if the letters COOL appear in a string you could use the INDEX() function or the FIND() function. If you want to test if the word COOL appears in the string then use the INDEXW() or FINDW() function.

if index(lowcase(Status),'cool') then ....
if find(status,'cool','i') then ...
if indexw(lowcase(Status),'cool',' ') then ....
if findww(status,'cool',' ','i') then ...

But even that probably has a logic problem given your example data. You had one record with STATUS='NOT COOL' which would match ALL of those tests, but looks to me like it means the opposite of COOL.

How to select first occurrence, ignore duplicates, categorize groups without occurrence?

Re: Select first occurrence of value then categorize other values in group, categorize other grou

Re: How to select first occurrence, ignore duplicates, categorize groups without occurrence?

Re: How to select first occurrence, ignore duplicates, categorize groups without occurrence?

Re: How to select first occurrence, ignore duplicates, categorize groups without occurrence?

Select first occurrence of value then categorize other values in group, categorize other groups

Re: Select first occurrence of value then categorize other values in group, categorize other grou

Re: Select first occurrence of value then categorize other values in group, categorize other grou

Re: Select first occurrence of value then categorize other values in group, categorize other grou

Re: Select first occurrence of value then categorize other values in group, categorize other grou

Classroom Training Available!