Update to the logic
data value ;
input Case Type1 $ Value1 Date1 mmddyy10. Type2 $ Value2 Date2 mmddyy10. Type3 $ Value3 Date3 mmddyy10.;
cards;
1 V 7 12/14/2021 C 12 11/13/2020 V 24 06/05/2019
2 C 8 12/15/2021 B 13 11/14/2020 C 9 06/06/2019
3 V 7 12/14/2021 V 7 11/13/2020 V 7 12/16/2021
4 C 8 12/15/2021 C 8 11/14/2020 C 8 06/06/2019
5 . 6 12/15/2021 B 13 11/14/2020 B 8 06/06/2019
;
/*REVISED LOGIC
CASE#1 Regardless of other Types, select V with highest Value and associated Date
CASE#2 If V type doesn't exist, then pick C type with the lowest Value and associated Date
CASE#3 and CASE#4 If same types exist with same values (for example only C or only V WITH VALUE OF 7), then pick the Value with oldest Date
CASE#5 If V or C don't exist but type is blank or B, then pick the lowest Value and associated Date*/;
run;
Orange colored ones need to be in Value_New and Date_New
_________________________________________________________________________________________________________________
Using SAS arrays and do loop, I am trying to create new two columns Value_New and Date_New. The logic :
If type is V, then pick the highest value and Date associated with that V
If only type C exists, than take the lowest value and Date associated with that C
For any type, if there is only one type (only V or only C), and if values are equal, than take the one with earliest date.
Case | Type1 | Value1 | Date1 | Type2 | Value2 | Date2 | Type3 | Value3 | Date3 | Value_New | Date_New |
1 | V | 7 | 12/14/2021 | C | 12 | 11/13/2020 | V | 24 | 6/5/2019 | 24 | 6/5/2019 |
2 | C | 8 | 12/15/2021 | V | 13 | 11/14/2020 | C | 8 | 6/6/2019 | 8 | 6/6/2019 |
Technique when using a long dataset:
data value;
input Case Type1 $ Value1 Date1 mmddyy10. Type2 $ Value2 Date2 mmddyy10. Type3 $ Value3 Date3 mmddyy10.;
cards;
1 V 7 12/14/2021 C 12 11/13/2020 V 24 06/05/2019
2 C 8 12/15/2021 B 13 11/14/2020 C 9 06/06/2019
3 V 7 12/14/2021 V 7 11/13/2020 V 7 12/16/2021
4 C 8 12/15/2021 C 8 11/14/2020 C 8 06/06/2019
5 . 6 12/15/2021 B 13 11/14/2020 B 8 06/06/2019
;
data long;
set value;
array t {*} type:;
array v {*} value:;
array d {*} date:;
do i = 1 to dim(t);
type = t{i};
value = v{i};
date = d{i};
output;
end;
format date yymmdd10.;
keep case type value date;
run;
proc sort data=long;
by case date;
run;
data want;
do until (last.case);
set long;
by case;
select (type);
when ('V') do;
if _type ne 'V' or (_type = 'V' and _value lt value)
then do;
_type = 'V';
_value = value;
_date = date;
end;
end;
when ('C') do;
if _type in ('B','') or (_type = 'C' and (_value gt value or _value = .))
then do;
_type = 'C';
_value = value;
_date = date;
end;
end;
when ('B','') do;
if _type in ('B','') and (_value gt value or _value = .)
then do;
_type = type;
_value = value;
_date = date;
end;
end;
end;
end;
format _date yymmdd10.;
drop type value date;
rename
_type = type
_value = value
_date = date
;
run;
I don't think your rules are complete.
What do want as the answer when there are both V and C types?
Wouldn't this whole operation be a lot easier if you had a separated observation for each TYPE/VALUE/DATE combination?
Your rules and data don't align.
@sasuser123 wrote:
Using SAS arrays and do loop, I am trying to create new two columns Value_New and Date_New. The logic :
If type is V, then pick the highest value and Date associated with that V
If only type C exists, than take the lowest value and Date associated with that C
For any type, if there is only one type (only V or only C), and if values are equal, than take the one with earliest date.
Case Type1 Value1 Date1 Type2 Value2 Date2 Type3 Value3 Date3 Value_New Date_New 1 V 7 12/14/2021 C 12 11/13/2020 V 24 6/5/2019 24 6/5/2019 2 C 8 12/15/2021 V 13 11/14/2020 C 8 6/6/2019 8 <- not max? 6/6/2019
Both rows have a V but you take the minimum in the second case? I agree with others that a long format is more usuable here rather than a wide format but we do need clarification on the logic first.
Here's a rough idea to get you started but I wouldn't recommend this approach.
data have (drop = value_new date_new);
infile cards dlm='09'x;
informat case 8. type1 - type3 $8. Value1-Value3 Value_new 8. Date1-Date3 Date_new mmddyy10. ;
format case 8. type1 - type3 $8. Value1-Value3 Value_new 8. Date1-Date3 Date_new mmddyy10. ;
input Case Type1 Value1 Date1 Type2 Value2 Date2 Type3 Value3 Date3 Value_New Date_New;
cards;
1 V 7 12/14/2021 C 12 11/13/2020 V 24 6/5/2019 24 6/5/2019
2 C 8 12/15/2021 V 13 11/14/2020 C 8 6/6/2019 8 6/6/2019
3 V 7 12/14/2021 V 12 11/13/2020 V 24 6/5/2019 24 6/5/2019
4 C 8 12/15/2021 C 13 11/14/2020 C 8 6/6/2019 8 6/6/2019
;;;;
run;
proc print data=have;run;
data want;
set have;
array _val(3) value1-value3;
array _date(3) date1-date3;
array _type(3) type1-type3;
C_Found = 0; V_Found=0;
if whichc('V', of _type(*))>0 then V_Found = 1;
if whichc('C', of _type(*))>0 then C_Found = 1;
if V_FOUND then do;
value_new = max(of _val(*));
date_new = _date(whichn(value_new, of _val(*)));
end;
*have both V/C;
else do;
value_new = min(of _val(*));
date_new = _date(whichn(value_new, of _val(*)));
end;
format date_new mmddyys10.;
run;
proc print data=want;
run;
Thanks, this code is very helpful, I run it and identified additional info. I realized that there are blank and B types that I didn't include in the original post. V type is priority and need to have the highest value of V type. If V type doesn't exist, than I need C type with the lowest value of type C. When C or V don't exist then I can take any types with lowest value. When types are same and values are same, such as all types are C and values=8, then I need the one with the oldest date. I hope this makes sense.
Yes, I modified the code but it stopped picking the highest value for V and doesn't pick the oldest date if types and values are same.
data want;
set value;
array _val(3) value1-value3;
array _date(3) date1-date3;
array _type(3) type1-type3;
C_Found = 0; V_Found=0;b_Found = 0;
if whichc('V', of _type(*))>0 then V_Found = 1;
if whichc('C', of _type(*))>0 then C_Found = 1;
if whichc(' ', of _type(*))>0 then B_Found = 1;
if whichc('B', of _type(*))>0 then B_Found = 1;
if V_FOUND=1 then do;
value_new = max(of _val(*));
date_new = _date(whichn(value_new, of _val(*)));
end;
if C_FOUND=1 then do;
value_new = min(of _val(*));
date_new = _date(whichn(value_new, of _val(*)));
end;
else do;
value_new = min(of _val(*));
date_new = _date(whichn(value_new, of _val(*)));
end;
format date_new mmddyys10.;
run;
Technique when using a long dataset:
data value;
input Case Type1 $ Value1 Date1 mmddyy10. Type2 $ Value2 Date2 mmddyy10. Type3 $ Value3 Date3 mmddyy10.;
cards;
1 V 7 12/14/2021 C 12 11/13/2020 V 24 06/05/2019
2 C 8 12/15/2021 B 13 11/14/2020 C 9 06/06/2019
3 V 7 12/14/2021 V 7 11/13/2020 V 7 12/16/2021
4 C 8 12/15/2021 C 8 11/14/2020 C 8 06/06/2019
5 . 6 12/15/2021 B 13 11/14/2020 B 8 06/06/2019
;
data long;
set value;
array t {*} type:;
array v {*} value:;
array d {*} date:;
do i = 1 to dim(t);
type = t{i};
value = v{i};
date = d{i};
output;
end;
format date yymmdd10.;
keep case type value date;
run;
proc sort data=long;
by case date;
run;
data want;
do until (last.case);
set long;
by case;
select (type);
when ('V') do;
if _type ne 'V' or (_type = 'V' and _value lt value)
then do;
_type = 'V';
_value = value;
_date = date;
end;
end;
when ('C') do;
if _type in ('B','') or (_type = 'C' and (_value gt value or _value = .))
then do;
_type = 'C';
_value = value;
_date = date;
end;
end;
when ('B','') do;
if _type in ('B','') and (_value gt value or _value = .)
then do;
_type = type;
_value = value;
_date = date;
end;
end;
end;
end;
format _date yymmdd10.;
drop type value date;
rename
_type = type
_value = value
_date = date
;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.