Hello,
I have a dataset here:
data readin;
input ID Name Status; cards;
1 David Cool
1 David Cool
1 David Not Cool
2 Sam Warm
2 Sam Lukewarm
3 Bane Cool
4 Mary Warm
3 Bane Not Cool
5 Dane Cool
6 Jenny Cool
7 Ken Not Cool
8 Simran Warm
9 Priya Hot ; run;
And for each group I want to select the first occurrence of STATUS where the value is "Cool" . I want ignore duplicate values of "Cool", categorize the other values in the group with another variable, Bingo, as "Other Statuses for this person". I want to also categorize other groups without the value of "Cool" as "Other Statuses". If possible I would like one data step or proc sql step for this. I know I need to sort the data too.
I want the data to look something like this
ID | Name | Status | Bingo |
---|---|---|---|
1 | David | Cool | Cool |
1 | David |
Cool |
Other Status for this person |
1 | David |
Not Cool |
Other Status for this person |
2 | Sam |
Warm |
Other Statuses |
How can achieve this? Any help with this is welcome and much appreciated!
Thank you,
J
So this statement
Where upcase(Status) like '&Cool&';
Can never work.
If you convert STATUS to all uppercase then it could never have lower case letters o and l in it to match the condition.
The LOGIC is also wrong.
If you only process the records with COOL in them then the groups that do not have any such records will disappear. Use the IF statement, not a WHERE statement.
If you want to test if the letters COOL appear in a string you could use the INDEX() function or the FIND() function. If you want to test if the word COOL appears in the string then use the INDEXW() or FINDW() function.
if index(lowcase(Status),'cool') then ....
if find(status,'cool','i') then ...
if indexw(lowcase(Status),'cool',' ') then ....
if findww(status,'cool',' ','i') then ...
But even that probably has a logic problem given your example data. You had one record with STATUS='NOT COOL' which would match ALL of those tests, but looks to me like it means the opposite of COOL.
I am not sure I understand you description of the logic used. I think more details would be helpful. I think a complete list of the desired output, for all IDs would be helfpul; you only show outputs for IDs 1 and 2, I want to see the desired output for IDs 1 to 9.
Furthermore, your code does not work, see the screen capture below, and I request that you fix it.
@JMagenta wrote:
Sorry for the confusion. I needed a solution not a troll. Thank you though.
Just explain what you want more clearly. For example one question is what happens to the observations before the first observation that is COOL?
Providing working data steps will make it easier for others to help you. It also make it clearer what your data actually looks like.
Providing the complete expected output (again as working data step) will make it easier to test whether suggested code could work.
Hello,
I have a data set, SAMPLE, here:
ID | Name | Status |
---|---|---|
1 | David | Cool |
1 | David |
Cool |
1 | David |
Not Cool |
2 | Sam |
Warm |
For each group I want to select the first occurrence of STATUS where the value is "Cool" and set my new variable Bingo as "Cool Person". I want categorize duplicate values of "Cool" for the variable STATUS, as well as other values of STATUS in the group with the variable, Bingo, as "Other Statuses for this person". I want to also categorize other groups without the value of "Cool" for STATUS as "Other Statuses" for BINGO. If possible I would like one data step or proc sql step for this. I know I need to sort the data too.
I want the data to look something like this
ID | Name | Status | Bingo |
---|---|---|---|
1 | David | Cool | Cool Person |
1 | David |
Cool |
Other Status for this person |
1 | David |
Not Cool |
Other Status for this person |
2 | Sam |
Warm |
Other Statuses |
How can achieve this? Any help with this is welcome and much appreciated!
Thank you,
J
Based on your requirements this seems like it gets you close.
@JMagenta wrote:
Hello,
I have a data set, SAMPLE, here:
ID Name Status 1 David Cool 1 David Cool
1 David Not Cool
2 Sam Warm
For each group I want to select the first occurrence of STATUS where the value is "Cool" and set my new variable Bingo as "Cool Person". I want categorize duplicate values of "Cool" for the variable STATUS, as well as other values of STATUS in the group with the variable, Bingo, as "Other Statuses for this person". I want to also categorize other groups without the value of "Cool" for STATUS as "Other Statuses" for BINGO. If possible I would like one data step or proc sql step for this. I know I need to sort the data too.
I want the data to look something like this
ID Name Status Bingo 1 David Cool Cool Person 1 David Cool
Other Status for this person 1 David Not Cool
Other Status for this person 2 Sam Warm
Other Statuses
How can achieve this? Any help with this is welcome and much appreciated!
Thank you,
J
So reading between the lines it looks like ID is the variable that identifies the groups. And it seems like you just want to set one of two values for the first observation of a group and then a constant value for all of the extra observations for the group. Also it seems from the values of this new variable that each group of observations represents a different person.
NOTE: "Cool Person" is not a valid variable name so let's just use NEW as the variable name.
data have ;
input ID Name $ Status $10. ;
cards;
1 David Cool
1 David Cool
1 David Not Cool
2 Sam Warm
;
data want;
set have ;
by id;
length NEW $30 ;
if first.id then do;
if status = 'Cool' then NEW = 'Cool Person';
else NEW = 'Other statuses' ;
end;
else NEW = 'Other Status for this person' ;
run;
Result.
Obs ID Name Status NEW 1 1 David Cool Cool Person 2 1 David Cool Other Status for this person 3 1 David Not Cool Other Status for this person 4 2 Sam Warm Other statuses
If the situation is more complicated then perhaps you could provide some more observations that have examples of those complications.
Thanks you,
I will add one caveat, there is a parsing where statement that describes what the first occurrence should equal. It also includes a like statement. So I tried this:
data have ;
input ID Name $ Status $10. ;
cards;
1 David Cool
1 David Cool
1 David Not Cool
2 Sam Warm
;
data want;
set have ;
by id;
length NEW $30 ;
Where upcase(Status) like '&Cool&';
if first.id then do;
NEW = 'Cool Person';
else do NEW = 'Other statuses' ;
end;
else NEW = 'Other Status for this person' ;
run;
But it didn't work.
So this statement
Where upcase(Status) like '&Cool&';
Can never work.
If you convert STATUS to all uppercase then it could never have lower case letters o and l in it to match the condition.
The LOGIC is also wrong.
If you only process the records with COOL in them then the groups that do not have any such records will disappear. Use the IF statement, not a WHERE statement.
If you want to test if the letters COOL appear in a string you could use the INDEX() function or the FIND() function. If you want to test if the word COOL appears in the string then use the INDEXW() or FINDW() function.
if index(lowcase(Status),'cool') then ....
if find(status,'cool','i') then ...
if indexw(lowcase(Status),'cool',' ') then ....
if findww(status,'cool',' ','i') then ...
But even that probably has a logic problem given your example data. You had one record with STATUS='NOT COOL' which would match ALL of those tests, but looks to me like it means the opposite of COOL.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.