BookmarkSubscribeRSS Feed
don21
Quartz | Level 8

Dear all,

 

request your help with the below query:

 

I have a data set which looks like below:

ID   ID_Ac   Status
1    112      Green
1    113     Green
1    156      Red
2    121      Blue
3    135     red
3    142     red
3    155     Red
3    198     Blue

 

The output must have an additional column called final_status

The condition that I need to implement here is :

If atleast one is Greens for an ID, the final status should be Green (like 1 green and 4 reds for same ID - we have to update as Green for all IDs:

 

ID ID_Ac Status       Final Status
1 112  Green         Green
1 113  Green        Green
1 156  Red           Green
2 121  yellow            yellow
3 135  red                Blue
3 142  red                Blue
3 198  Blue                 Blue
4 202  red                       Red
4 222  red                red

 

 

Please help 🙂

 

 

Thank you in advance..

12 REPLIES 12
don21
Quartz | Level 8

Or , the output can also look like below:

only unique IDs with the final status..

 

ID    Final Status
1        Green
2          yellow
3           Blue
4          Red

 

 

Please help.....

LinusH
Tourmaline | Level 20

Why is ID 3 blue, and not red? Is there a hierarchy?

It seems your rule is not complete.

Data never sleeps
don21
Quartz | Level 8

HI LinisH,

 

 

ID 3 is Blue because there is atleast oone blue for that ID. RED will be taken only if all the values for the ID are REDs..

 

 

Thank you..

FreelanceReinh
Jade | Level 19

Hello @don21,

 

It really looks like you have a hierarchy of status values, e.g. "Green is preferred over Yellow, Yellow is preferred over Blue, ..." If this is the case, variable STATUS should actually be numeric and the strings "Green" etc. should be only format labels. Then, the code suggested below could be simplified (in particular, you wouldn't need an informat).

proc format;
invalue rank
'green'  = 1
'yellow' = 2
'blue'   = 3
'red'    = 4;

value rank
1 = 'Green'
2 = 'Yellow'
3 = 'Blue'
4 = 'Red';
run;

data have;
input ID ID_Ac Status $;
cards;
1 112 Green
1 113 Green
1 156 Red
2 121 yellow
3 135 red
3 142 red
3 198 Blue
4 202 red
4 222 red
;

proc sql;
create table want as
select id, put(min(input(lowcase(status), rank.)), rank.) as Final_Status
from have
group by id;
quit;

If variable STATUS has only missing values for a particular ID, Final_Status will be missing. Otherwise, missing values will be ignored. Or should missing values be handled differently?

To switch to your original output format, simply change the SELECT statement to "select *, put(..." and add an ORDER BY clause:

group by id
order by id, id_ac;

 

don21
Quartz | Level 8

Dear Reinhard,

 

 

Thank you for the quick response, I tried running the code. went welll till the end but, got this error in the final step:

 

NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Invalid string.
NOTE: Invalid argument to function INPUT. Missing values may be generated.
NOTE: Table WORK.SAMPLE created, with 9 rows and 3 columns.

 

Please advise.

 

 

Thank you.. 🙂

don21
Quartz | Level 8

looking forward to your advise on this..

 

 

Thank you in advance.

don21
Quartz | Level 8
I need to mention that, I got the output table but the final;_Status is blank in it
don21
Quartz | Level 8

Hi all,

 

 

I made some changes to the source and required data to make my request look more clear:

 

HAVE:

 

Account   Relation            Status
332           1                 OPENED                            Open, if atleast 1 account tagged to the relationship is in Open status
112          1                   CLOSED                            Unclaimed, if all accounts tagged to the relationship is in Unclaimed
154          2                  UNCLAIMED               Dormant, if all accounts tagged to the relationship is either Dormant / Unclaimed status
144          2                  UNCLAIMED                      Closed, if all accounts tagged to the account are closed
198          3                   DORMANT
189          3                    DORMANT
157          3                 UNCLAIMED
111          4                 CLOSED
163          4                   CLOSED
127         4                   CLOSED

 

 

REQUIERD:

 

PFA attached file for more clarity...

 

Please help Thank you..

MadhuKorni
Quartz | Level 8

The below code is applicable only if the status var have the below mentioned values.


data have;
input Account Relation Status $ 10.;
cards;
332 1 OPENED
112 1 CLOSED
154 2 UNCLAIMED
144 2 UNCLAIMED
198 3 DORMANT
189 3 DORMANT
157 3 UNCLAIMED
111 4 CLOSED
163 4 CLOSED
127 4 CLOSED
;

proc sql;
create table want1 as
select *, count( distinct status) as diststatcnt from have
group by relation;
create table want2 as
select *, count(status) as statcnt from want1
group by relation, status;
create table want3 as select *, count(relation) as relcnt from want2
group by relation;
quit;

 

data want(drop = diststatcnt statcnt relcnt);
set want3;
length Final_Status $ 15;
if status = "OPENED" then final_status = "Open";
if status = "CLOSED" and statcnt = relcnt then final_status = "Close";
if status = "UNCLAIMED" and statcnt = relcnt then final_status = "Unclaimed";
if status in ('DORMANT' 'UNCLAIMED') and diststatcnt = 2 then final_status = "Dormant";
if final_status = " " then final_status = "Open";
run;

FreelanceReinh
Jade | Level 19

Hi @don21,

 

Thanks for the clarification of your request. Please note that your specifications do not cover the following cases:

  • status values of one relationship include DORMANT and CLOSED, but neither OPENED nor UNCLAIMED
  • status values of one relationship include UNCLAIMED and CLOSED, but neither OPENED nor DORMANT
  • status values of one relationship include UNCLAIMED, DORMANT and CLOSED, but not OPENED.

Please confirm that these cases cannot occur (not even due to data issues) or define the Final_Status for each of them.

 

For illustration, you can add the following datalines to your sample data and apply suggested solutions to them.

111 5 DORMANT
163 5 CLOSED 
127 5 CLOSED
111 6 CLOSED 
163 6 UNCLAIMED
127 6 CLOSED
111 7 DORMANT
163 7 CLOSED 
127 7 UNCLAIMED
FreelanceReinh
Jade | Level 19

@sindhu wrote:

 

Thank you for the quick response, I tried running the code. went welll till the end but, got this error in the final step:

(...)
NOTE: Invalid string.
NOTE: Invalid argument to function INPUT. Missing values may be generated.
NOTE: Table WORK.SAMPLE created, with 9 rows and 3 columns.

 

Please advise.


Hi @don21,

 

Well, you did not run the code I had posted, but some different code: Not only is your output dataset named differently (SAMPLE; of course, this is not the issue), but more importantly it has 3 columns. My code results correctly in a work dataset WANT with 4 rows and 2 columns. So, you must have modified the SELECT statement and used different input data (thus getting 9 rows as a result).

 

Most probably, your input data contain a value which is not covered by informat RANK. If I change, for example, "yellow" to "orange" in my input data without adapting the INVALUE statement of PROC FORMAT correspondingly, I get the same "Invalid string" issue.

 

So, please make sure that the lowcase version of every single status value in your input data is listed in the INVALUE statement of PROC FORMAT in order to specify a rank. Alternatively (if you cannot know all of your status values in advance), you could modify the format and informat as follows:

proc format;
invalue rank
'green'  = 1
'yellow' = 2
'blue'   = 3
'red'    = 4
other    = 999
;

value rank
1 = 'Green'
2 = 'Yellow'
3 = 'Blue'
4 = 'Red'
999='-invalid-';
run;

Then, all unexpected values of status would be classified as "worse than 'red'" and if an ID has only such values, its Final_Status would be '-invalid-' (of course, you may modify this label).

don21
Quartz | Level 8

Thank you @FreelanceReinh..  🙂

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2974 views
  • 2 likes
  • 4 in conversation