Hi Guy's,
I have a dataset example (below) and I need to create 2 new variables
1. Max_Date_New
2. Variable_Name_New
I need to work out how to populate the new variables highlighted in RED.
If the count value is the same for multiple variables (like in the first row), I then need the max date from the date variables which is F_date so I need the 2 new variables to produce F_Count and F_Date.
I hope that makes sense.
Variable Name - Value | Variable Name - Date | Returned Variables | |||||||||||||
A_Count | B_Count | C_Count | D_Count | E_Count | F_Count | G_Count | A_Date | B_Date | C_Date | D_Date | E_Date | F_Date | G_Date | Max_Date_New | Variable_Name_New |
8 | 0 | 0 | 0 | 10 | 10 | 6 | 01/01/2017 | 01/02/2017 | . | . | 03/03/2017 | 04/03/2017 | 02/02/2017 | 04/03/2017 | F_Count |
9 | 1 | 1 | 10 | 6 | 4 | 1 | 02/01/2018 | 01/06/2016 | 05/06/2017 | 09/02/2017 | 11/11/2017 | 16/12/2017 | 08/03/2016 | 09/02/2017 | D_Count |
9 | 0 | 10 | 9 | 0 | 0 | 10 | 28/10/2016 | . | 25/01/2017 | 16/10/2017 | . | . | 22/06/2017 | 25/01/2017 | C_Count |
2 | 0 | 0 | 0 | 0 | 0 | 0 | 18/08/2016 | . | . | . | . | . | . | 18/08/2016 | A_Count |
2 | 4 | 6 | 5 | 8 | 0 | 0 | 24/04/2017 | 22/05/2016 | 17/07/2017 | 23/03/2016 | 28/07/2017 | . | . | 28/07/2017 | E_Count |
Something like this?
data have;
input A_Count B_Count C_Count D_Count E_Count F_Count G_Count (A_Date B_Date C_Date D_Date E_Date F_Date G_Date)(:ddmmyy10.);
format A_Date B_Date C_Date D_Date E_Date F_Date G_Date ddmmyy10.;
datalines;
8 0 0 0 10 10 6 01/01/2017 01/02/2017 . . 03/03/2017 04/03/2017 02/02/2017
9 1 1 10 6 4 1 02/01/2018 01/06/2016 05/06/2017 09/02/2017 11/11/2017 16/12/2017 08/03/2016
9 0 10 9 0 0 10 28/10/2016 . 25/01/2017 16/10/2017 . . 22/06/2017
2 0 0 0 0 0 0 18/08/2016 . . . . . .
2 4 6 5 8 0 0 24/04/2017 22/05/2016 17/07/2017 23/03/2016 28/07/2017 . .
;
data want;
set have;
c=0;
array counts{*} A_Count B_Count C_Count D_Count E_Count F_Count G_Count;
array dates{*} A_Date B_Date C_Date D_Date E_Date F_Date G_Date;
array maxdummies{*} dummy1-dummy7;
maxcount=max(of counts[*]);
maxpos=whichn(maxcount, of counts[*]);
do i=1 to dim(counts);
if counts[i]=maxcount then do;
c+1;
maxdummies[i]=1;
end;
end;
Variable_Name_New=vname(counts[maxpos]);
Max_Date_New=dates[maxpos];
if c>1 then do j=1 to dim(dates);
if maxdummies[j]=1 then do;
if dates[j] > Max_Date_New then do;
Max_Date_New=dates[j];
Variable_Name_New=vname(counts[j]);
end;
end;
end;
format max_date_new ddmmyy10.;
drop c i j maxcount maxpos dummy1-dummy7;
run;
data have;
input A_Count B_Count C_Count D_Count E_Count F_Count G_Count (A_Date B_Date C_Date D_Date E_Date F_Date G_Date)(:ddmmyy10.);
format A_Date B_Date C_Date D_Date E_Date F_Date G_Date ddmmyy10.;
datalines;
8 0 0 0 10 10 6 01/01/2017 01/02/2017 . . 03/03/2017 04/03/2017 02/02/2017
9 1 1 10 6 4 1 02/01/2018 01/06/2016 05/06/2017 09/02/2017 11/11/2017 16/12/2017 08/03/2016
9 0 10 9 0 0 10 28/10/2016 . 25/01/2017 16/10/2017 . . 22/06/2017
2 0 0 0 0 0 0 18/08/2016 . . . . . .
2 4 6 5 8 0 0 24/04/2017 22/05/2016 17/07/2017 23/03/2016 28/07/2017 . .
;
run;
proc contents data=have out=contents noprint;
run;
proc sql;
select name into :countvar separated by ' ' from contents where upcase(name) like '%COUNT';
select name into :datevar separated by ',' from contents where upcase(name) like '%DATE';;
quit;
%macro test(dsn,vars,func);
data &dsn;
set &dsn;
format Max_Date_New date9.;
array list(*) &vars;
Max_Date_New=&func(&datevar.);
Variable_Name_New = vname(list[whichn(&func(of list[*]), of list[*])]);
run;
%mend test;
/** retrieve maximum value from a b and c **/
%test(have,&countvar.,max);
Doc: http://support.sas.com/kb/46/471.html
This Works.
Something like this?
data have;
input A_Count B_Count C_Count D_Count E_Count F_Count G_Count (A_Date B_Date C_Date D_Date E_Date F_Date G_Date)(:ddmmyy10.);
format A_Date B_Date C_Date D_Date E_Date F_Date G_Date ddmmyy10.;
datalines;
8 0 0 0 10 10 6 01/01/2017 01/02/2017 . . 03/03/2017 04/03/2017 02/02/2017
9 1 1 10 6 4 1 02/01/2018 01/06/2016 05/06/2017 09/02/2017 11/11/2017 16/12/2017 08/03/2016
9 0 10 9 0 0 10 28/10/2016 . 25/01/2017 16/10/2017 . . 22/06/2017
2 0 0 0 0 0 0 18/08/2016 . . . . . .
2 4 6 5 8 0 0 24/04/2017 22/05/2016 17/07/2017 23/03/2016 28/07/2017 . .
;
data want;
set have;
c=0;
array counts{*} A_Count B_Count C_Count D_Count E_Count F_Count G_Count;
array dates{*} A_Date B_Date C_Date D_Date E_Date F_Date G_Date;
array maxdummies{*} dummy1-dummy7;
maxcount=max(of counts[*]);
maxpos=whichn(maxcount, of counts[*]);
do i=1 to dim(counts);
if counts[i]=maxcount then do;
c+1;
maxdummies[i]=1;
end;
end;
Variable_Name_New=vname(counts[maxpos]);
Max_Date_New=dates[maxpos];
if c>1 then do j=1 to dim(dates);
if maxdummies[j]=1 then do;
if dates[j] > Max_Date_New then do;
Max_Date_New=dates[j];
Variable_Name_New=vname(counts[j]);
end;
end;
end;
format max_date_new ddmmyy10.;
drop c i j maxcount maxpos dummy1-dummy7;
run;
@Brandon16, I get a slightly different result than you desire, but I fail to see the logic to why the Max_Date_New value in row 3 should be 25/01/2017 and not 22/06/2017?
Shouldn't it be the maximum of the dates that resemble the tied maximum count values?
What "first statement" are you refering to?
data have;
input A_Count B_Count C_Count D_Count E_Count F_Count G_Count (A_Date B_Date C_Date D_Date E_Date F_Date G_Date)(:ddmmyy10.);
format A_Date B_Date C_Date D_Date E_Date F_Date G_Date ddmmyy10.;
datalines;8 0 0 0 10 10 6 01/01/2017 01/02/2017 . . 03/03/2017 04/03/2017 02/02/2017
9 1 1 10 6 4 1 02/01/2018 01/06/2016 05/06/2017 09/02/2017 11/11/2017 16/12/2017 08/03/2016
9 0 10 9 0 0 10 28/10/2016 . 25/01/2017 16/10/2017 . . 22/06/2017
2 0 0 0 0 0 0 18/08/2016 . . . . . .
2 4 6 5 8 0 0 24/04/2017 22/05/2016 17/07/2017 23/03/2016 28/07/2017 . .;
I simply transformed your inserted sample data to a data step, so you can insert my entire code and see that you get the desired result.
Makes sense?
Are you sure you ran the entire code I posted.
Please run this and post the result
proc print data=want;
var max_date_new Variable_Name_New;
run;
Another issue is I should see the following results returned below........
A_Count | B_Count | C_Count | D_Count | E_Count | F_Count | G_Count | A_Date | B_Date | C_Date | D_Date | E_Date | F_Date | G_Date | Max_Date_New | Variable_Name_New |
8 | 0 | 8 | 8 | 8 | 0 | 0 | 20/04/2015 | . | 18/03/2017 | 18/03/2017 | . | . | . | 20/04/2015 | A_Count |
Instead I get this… | |||||||||||||||
A_Count | B_Count | C_Count | D_Count | E_Count | F_Count | G_Count | A_Date | B_Date | C_Date | D_Date | E_Date | F_Date | G_Date | Max_Date_New | Variable_Name_New |
8 | 0 | 8 | 8 | 8 | 0 | 0 | 20/04/2015 | . | 18/03/2013 | 18/03/2013 | . | . | . | 20/04/2015 | C_Count |
I do not follow your logic? Why do you want Max_Date_New to be 20/04/2015 when 18/03/2017 is larger (C_Date).
Isn't that the whole point of your logic, that you want the date values to be the tie breakers, when there are more than 1 max value i your count variables?
Sorry, I hadn't updated the table correctly, the years are 2013 NOT 2017. So C_date and D_date are both 18/03/2013 not 18/03/2017.
if any of the values match within A_count - G counts, I want the value that displays the latest date
So in this example, as A_Count relates to A_date (which has the latest date) I want Variable_New_Name to show A_COUNT
My code does that exactly? Please run the below code
data have;
input A_Count B_Count C_Count D_Count E_Count F_Count G_Count (A_Date B_Date C_Date D_Date E_Date F_Date G_Date)(:ddmmyy10.);
format A_Date B_Date C_Date D_Date E_Date F_Date G_Date ddmmyy10.;
datalines;
8 0 8 8 8 0 0 20/04/2015 . 18/03/2013 18/03/2013 . . .
;
data want;
set have;
c=0;
array counts{*} A_Count B_Count C_Count D_Count E_Count F_Count G_Count;
array dates{*} A_Date B_Date C_Date D_Date E_Date F_Date G_Date;
array maxdummies{*} dummy1-dummy7;
maxcount=max(of counts[*]);
maxpos=whichn(maxcount, of counts[*]);
do i=1 to dim(counts);
if counts[i]=maxcount then do;
c+1;
maxdummies[i]=1;
end;
end;
Variable_Name_New=vname(counts[maxpos]);
Max_Date_New=dates[maxpos];
if c>1 then do j=1 to dim(dates);
if maxdummies[j]=1 then do;
if dates[j] > Max_Date_New then do;
Max_Date_New=dates[j];
Variable_Name_New=vname(counts[j]);
end;
end;
end;
format max_date_new ddmmyy10.;
drop c i j maxcount maxpos dummy1-dummy7;
run;
proc print data=want;
var Max_Date_New Variable_Name_New;
run;
It gives you the output
Which is exactly what you are asking for?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.