Dear all,
request your help with the below query:
I have a data set which looks like below:
ID ID_Ac Status
1 112 Green
1 113 Green
1 156 Red
2 121 Blue
3 135 red
3 142 red
3 155 Red
3 198 Blue
The output must have an additional column called final_status
The condition that I need to implement here is :
If atleast one is Greens for an ID, the final status should be Green (like 1 green and 4 reds for same ID - we have to update as Green for all IDs:
ID ID_Ac Status Final Status
1 112 Green Green
1 113 Green Green
1 156 Red Green
2 121 yellow yellow
3 135 red Blue
3 142 red Blue
3 198 Blue Blue
4 202 red Red
4 222 red red
Please help 🙂
Thank you in advance..
Or , the output can also look like below:
only unique IDs with the final status..
ID Final Status
1 Green
2 yellow
3 Blue
4 Red
Please help.....
Why is ID 3 blue, and not red? Is there a hierarchy?
It seems your rule is not complete.
HI LinisH,
ID 3 is Blue because there is atleast oone blue for that ID. RED will be taken only if all the values for the ID are REDs..
Thank you..
Hello @don21,
It really looks like you have a hierarchy of status values, e.g. "Green is preferred over Yellow, Yellow is preferred over Blue, ..." If this is the case, variable STATUS should actually be numeric and the strings "Green" etc. should be only format labels. Then, the code suggested below could be simplified (in particular, you wouldn't need an informat).
proc format;
invalue rank
'green' = 1
'yellow' = 2
'blue' = 3
'red' = 4;
value rank
1 = 'Green'
2 = 'Yellow'
3 = 'Blue'
4 = 'Red';
run;
data have;
input ID ID_Ac Status $;
cards;
1 112 Green
1 113 Green
1 156 Red
2 121 yellow
3 135 red
3 142 red
3 198 Blue
4 202 red
4 222 red
;
proc sql;
create table want as
select id, put(min(input(lowcase(status), rank.)), rank.) as Final_Status
from have
group by id;
quit;
If variable STATUS has only missing values for a particular ID, Final_Status will be missing. Otherwise, missing values will be ignored. Or should missing values be handled differently?
To switch to your original output format, simply change the SELECT statement to "select *, put(..." and add an ORDER BY clause:
group by id
order by id, id_ac;
Dear Reinhard,
Thank you for the quick response, I tried running the code. went welll till the end but, got this error in the final step:
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Invalid string.
NOTE: Invalid argument to function INPUT. Missing values may be generated.
NOTE: Table WORK.SAMPLE created, with 9 rows and 3 columns.
Please advise.
Thank you.. 🙂
looking forward to your advise on this..
Thank you in advance.
Hi all,
I made some changes to the source and required data to make my request look more clear:
HAVE:
Account Relation Status
332 1 OPENED Open, if atleast 1 account tagged to the relationship is in Open status
112 1 CLOSED Unclaimed, if all accounts tagged to the relationship is in Unclaimed
154 2 UNCLAIMED Dormant, if all accounts tagged to the relationship is either Dormant / Unclaimed status
144 2 UNCLAIMED Closed, if all accounts tagged to the account are closed
198 3 DORMANT
189 3 DORMANT
157 3 UNCLAIMED
111 4 CLOSED
163 4 CLOSED
127 4 CLOSED
REQUIERD:
PFA attached file for more clarity...
Please help Thank you..
The below code is applicable only if the status var have the below mentioned values.
data have;
input Account Relation Status $ 10.;
cards;
332 1 OPENED
112 1 CLOSED
154 2 UNCLAIMED
144 2 UNCLAIMED
198 3 DORMANT
189 3 DORMANT
157 3 UNCLAIMED
111 4 CLOSED
163 4 CLOSED
127 4 CLOSED
;
proc sql;
create table want1 as
select *, count( distinct status) as diststatcnt from have
group by relation;
create table want2 as
select *, count(status) as statcnt from want1
group by relation, status;
create table want3 as select *, count(relation) as relcnt from want2
group by relation;
quit;
data want(drop = diststatcnt statcnt relcnt);
set want3;
length Final_Status $ 15;
if status = "OPENED" then final_status = "Open";
if status = "CLOSED" and statcnt = relcnt then final_status = "Close";
if status = "UNCLAIMED" and statcnt = relcnt then final_status = "Unclaimed";
if status in ('DORMANT' 'UNCLAIMED') and diststatcnt = 2 then final_status = "Dormant";
if final_status = " " then final_status = "Open";
run;
Hi @don21,
Thanks for the clarification of your request. Please note that your specifications do not cover the following cases:
Please confirm that these cases cannot occur (not even due to data issues) or define the Final_Status for each of them.
For illustration, you can add the following datalines to your sample data and apply suggested solutions to them.
111 5 DORMANT 163 5 CLOSED 127 5 CLOSED 111 6 CLOSED 163 6 UNCLAIMED 127 6 CLOSED 111 7 DORMANT 163 7 CLOSED 127 7 UNCLAIMED
@sindhu wrote:
Thank you for the quick response, I tried running the code. went welll till the end but, got this error in the final step:
(...)
NOTE: Invalid string.
NOTE: Invalid argument to function INPUT. Missing values may be generated.
NOTE: Table WORK.SAMPLE created, with 9 rows and 3 columns.
Please advise.
Hi @don21,
Well, you did not run the code I had posted, but some different code: Not only is your output dataset named differently (SAMPLE; of course, this is not the issue), but more importantly it has 3 columns. My code results correctly in a work dataset WANT with 4 rows and 2 columns. So, you must have modified the SELECT statement and used different input data (thus getting 9 rows as a result).
Most probably, your input data contain a value which is not covered by informat RANK. If I change, for example, "yellow" to "orange" in my input data without adapting the INVALUE statement of PROC FORMAT correspondingly, I get the same "Invalid string" issue.
So, please make sure that the lowcase version of every single status value in your input data is listed in the INVALUE statement of PROC FORMAT in order to specify a rank. Alternatively (if you cannot know all of your status values in advance), you could modify the format and informat as follows:
proc format;
invalue rank
'green' = 1
'yellow' = 2
'blue' = 3
'red' = 4
other = 999
;
value rank
1 = 'Green'
2 = 'Yellow'
3 = 'Blue'
4 = 'Red'
999='-invalid-';
run;
Then, all unexpected values of status would be classified as "worse than 'red'" and if an ID has only such values, its Final_Status would be '-invalid-' (of course, you may modify this label).
Thank you @FreelanceReinh.. 🙂
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.