Hi Guy's,
I have a dataset example (below) and I need to create 2 new variables
1. Max_Date_New
2. Variable_Name_New
I need to work out how to populate the new variables highlighted in RED.
If the count value is the same for multiple variables (like in the first row), I then need the max date from the date variables which is F_date so I need the 2 new variables to produce F_Count and F_Date.
I hope that makes sense.
Variable Name - Value | Variable Name - Date | Returned Variables | |||||||||||||
A_Count | B_Count | C_Count | D_Count | E_Count | F_Count | G_Count | A_Date | B_Date | C_Date | D_Date | E_Date | F_Date | G_Date | Max_Date_New | Variable_Name_New |
8 | 0 | 0 | 0 | 10 | 10 | 6 | 01/01/2017 | 01/02/2017 | . | . | 03/03/2017 | 04/03/2017 | 02/02/2017 | 04/03/2017 | F_Count |
9 | 1 | 1 | 10 | 6 | 4 | 1 | 02/01/2018 | 01/06/2016 | 05/06/2017 | 09/02/2017 | 11/11/2017 | 16/12/2017 | 08/03/2016 | 09/02/2017 | D_Count |
9 | 0 | 10 | 9 | 0 | 0 | 10 | 28/10/2016 | . | 25/01/2017 | 16/10/2017 | . | . | 22/06/2017 | 25/01/2017 | C_Count |
2 | 0 | 0 | 0 | 0 | 0 | 0 | 18/08/2016 | . | . | . | . | . | . | 18/08/2016 | A_Count |
2 | 4 | 6 | 5 | 8 | 0 | 0 | 24/04/2017 | 22/05/2016 | 17/07/2017 | 23/03/2016 | 28/07/2017 | . | . | 28/07/2017 | E_Count |
Something like this?
data have;
input A_Count B_Count C_Count D_Count E_Count F_Count G_Count (A_Date B_Date C_Date D_Date E_Date F_Date G_Date)(:ddmmyy10.);
format A_Date B_Date C_Date D_Date E_Date F_Date G_Date ddmmyy10.;
datalines;
8 0 0 0 10 10 6 01/01/2017 01/02/2017 . . 03/03/2017 04/03/2017 02/02/2017
9 1 1 10 6 4 1 02/01/2018 01/06/2016 05/06/2017 09/02/2017 11/11/2017 16/12/2017 08/03/2016
9 0 10 9 0 0 10 28/10/2016 . 25/01/2017 16/10/2017 . . 22/06/2017
2 0 0 0 0 0 0 18/08/2016 . . . . . .
2 4 6 5 8 0 0 24/04/2017 22/05/2016 17/07/2017 23/03/2016 28/07/2017 . .
;
data want;
set have;
c=0;
array counts{*} A_Count B_Count C_Count D_Count E_Count F_Count G_Count;
array dates{*} A_Date B_Date C_Date D_Date E_Date F_Date G_Date;
array maxdummies{*} dummy1-dummy7;
maxcount=max(of counts[*]);
maxpos=whichn(maxcount, of counts[*]);
do i=1 to dim(counts);
if counts[i]=maxcount then do;
c+1;
maxdummies[i]=1;
end;
end;
Variable_Name_New=vname(counts[maxpos]);
Max_Date_New=dates[maxpos];
if c>1 then do j=1 to dim(dates);
if maxdummies[j]=1 then do;
if dates[j] > Max_Date_New then do;
Max_Date_New=dates[j];
Variable_Name_New=vname(counts[j]);
end;
end;
end;
format max_date_new ddmmyy10.;
drop c i j maxcount maxpos dummy1-dummy7;
run;
data have;
input A_Count B_Count C_Count D_Count E_Count F_Count G_Count (A_Date B_Date C_Date D_Date E_Date F_Date G_Date)(:ddmmyy10.);
format A_Date B_Date C_Date D_Date E_Date F_Date G_Date ddmmyy10.;
datalines;
8 0 0 0 10 10 6 01/01/2017 01/02/2017 . . 03/03/2017 04/03/2017 02/02/2017
9 1 1 10 6 4 1 02/01/2018 01/06/2016 05/06/2017 09/02/2017 11/11/2017 16/12/2017 08/03/2016
9 0 10 9 0 0 10 28/10/2016 . 25/01/2017 16/10/2017 . . 22/06/2017
2 0 0 0 0 0 0 18/08/2016 . . . . . .
2 4 6 5 8 0 0 24/04/2017 22/05/2016 17/07/2017 23/03/2016 28/07/2017 . .
;
run;
proc contents data=have out=contents noprint;
run;
proc sql;
select name into :countvar separated by ' ' from contents where upcase(name) like '%COUNT';
select name into :datevar separated by ',' from contents where upcase(name) like '%DATE';;
quit;
%macro test(dsn,vars,func);
data &dsn;
set &dsn;
format Max_Date_New date9.;
array list(*) &vars;
Max_Date_New=&func(&datevar.);
Variable_Name_New = vname(list[whichn(&func(of list[*]), of list[*])]);
run;
%mend test;
/** retrieve maximum value from a b and c **/
%test(have,&countvar.,max);
Doc: http://support.sas.com/kb/46/471.html
This Works.
Something like this?
data have;
input A_Count B_Count C_Count D_Count E_Count F_Count G_Count (A_Date B_Date C_Date D_Date E_Date F_Date G_Date)(:ddmmyy10.);
format A_Date B_Date C_Date D_Date E_Date F_Date G_Date ddmmyy10.;
datalines;
8 0 0 0 10 10 6 01/01/2017 01/02/2017 . . 03/03/2017 04/03/2017 02/02/2017
9 1 1 10 6 4 1 02/01/2018 01/06/2016 05/06/2017 09/02/2017 11/11/2017 16/12/2017 08/03/2016
9 0 10 9 0 0 10 28/10/2016 . 25/01/2017 16/10/2017 . . 22/06/2017
2 0 0 0 0 0 0 18/08/2016 . . . . . .
2 4 6 5 8 0 0 24/04/2017 22/05/2016 17/07/2017 23/03/2016 28/07/2017 . .
;
data want;
set have;
c=0;
array counts{*} A_Count B_Count C_Count D_Count E_Count F_Count G_Count;
array dates{*} A_Date B_Date C_Date D_Date E_Date F_Date G_Date;
array maxdummies{*} dummy1-dummy7;
maxcount=max(of counts[*]);
maxpos=whichn(maxcount, of counts[*]);
do i=1 to dim(counts);
if counts[i]=maxcount then do;
c+1;
maxdummies[i]=1;
end;
end;
Variable_Name_New=vname(counts[maxpos]);
Max_Date_New=dates[maxpos];
if c>1 then do j=1 to dim(dates);
if maxdummies[j]=1 then do;
if dates[j] > Max_Date_New then do;
Max_Date_New=dates[j];
Variable_Name_New=vname(counts[j]);
end;
end;
end;
format max_date_new ddmmyy10.;
drop c i j maxcount maxpos dummy1-dummy7;
run;
@Brandon16, I get a slightly different result than you desire, but I fail to see the logic to why the Max_Date_New value in row 3 should be 25/01/2017 and not 22/06/2017?
Shouldn't it be the maximum of the dates that resemble the tied maximum count values?
What "first statement" are you refering to?
data have;
input A_Count B_Count C_Count D_Count E_Count F_Count G_Count (A_Date B_Date C_Date D_Date E_Date F_Date G_Date)(:ddmmyy10.);
format A_Date B_Date C_Date D_Date E_Date F_Date G_Date ddmmyy10.;
datalines;8 0 0 0 10 10 6 01/01/2017 01/02/2017 . . 03/03/2017 04/03/2017 02/02/2017
9 1 1 10 6 4 1 02/01/2018 01/06/2016 05/06/2017 09/02/2017 11/11/2017 16/12/2017 08/03/2016
9 0 10 9 0 0 10 28/10/2016 . 25/01/2017 16/10/2017 . . 22/06/2017
2 0 0 0 0 0 0 18/08/2016 . . . . . .
2 4 6 5 8 0 0 24/04/2017 22/05/2016 17/07/2017 23/03/2016 28/07/2017 . .;
I simply transformed your inserted sample data to a data step, so you can insert my entire code and see that you get the desired result.
Makes sense?
Are you sure you ran the entire code I posted.
Please run this and post the result
proc print data=want;
var max_date_new Variable_Name_New;
run;
Another issue is I should see the following results returned below........
A_Count | B_Count | C_Count | D_Count | E_Count | F_Count | G_Count | A_Date | B_Date | C_Date | D_Date | E_Date | F_Date | G_Date | Max_Date_New | Variable_Name_New |
8 | 0 | 8 | 8 | 8 | 0 | 0 | 20/04/2015 | . | 18/03/2017 | 18/03/2017 | . | . | . | 20/04/2015 | A_Count |
Instead I get this… | |||||||||||||||
A_Count | B_Count | C_Count | D_Count | E_Count | F_Count | G_Count | A_Date | B_Date | C_Date | D_Date | E_Date | F_Date | G_Date | Max_Date_New | Variable_Name_New |
8 | 0 | 8 | 8 | 8 | 0 | 0 | 20/04/2015 | . | 18/03/2013 | 18/03/2013 | . | . | . | 20/04/2015 | C_Count |
I do not follow your logic? Why do you want Max_Date_New to be 20/04/2015 when 18/03/2017 is larger (C_Date).
Isn't that the whole point of your logic, that you want the date values to be the tie breakers, when there are more than 1 max value i your count variables?
Sorry, I hadn't updated the table correctly, the years are 2013 NOT 2017. So C_date and D_date are both 18/03/2013 not 18/03/2017.
if any of the values match within A_count - G counts, I want the value that displays the latest date
So in this example, as A_Count relates to A_date (which has the latest date) I want Variable_New_Name to show A_COUNT
My code does that exactly? Please run the below code
data have;
input A_Count B_Count C_Count D_Count E_Count F_Count G_Count (A_Date B_Date C_Date D_Date E_Date F_Date G_Date)(:ddmmyy10.);
format A_Date B_Date C_Date D_Date E_Date F_Date G_Date ddmmyy10.;
datalines;
8 0 8 8 8 0 0 20/04/2015 . 18/03/2013 18/03/2013 . . .
;
data want;
set have;
c=0;
array counts{*} A_Count B_Count C_Count D_Count E_Count F_Count G_Count;
array dates{*} A_Date B_Date C_Date D_Date E_Date F_Date G_Date;
array maxdummies{*} dummy1-dummy7;
maxcount=max(of counts[*]);
maxpos=whichn(maxcount, of counts[*]);
do i=1 to dim(counts);
if counts[i]=maxcount then do;
c+1;
maxdummies[i]=1;
end;
end;
Variable_Name_New=vname(counts[maxpos]);
Max_Date_New=dates[maxpos];
if c>1 then do j=1 to dim(dates);
if maxdummies[j]=1 then do;
if dates[j] > Max_Date_New then do;
Max_Date_New=dates[j];
Variable_Name_New=vname(counts[j]);
end;
end;
end;
format max_date_new ddmmyy10.;
drop c i j maxcount maxpos dummy1-dummy7;
run;
proc print data=want;
var Max_Date_New Variable_Name_New;
run;
It gives you the output
Which is exactly what you are asking for?
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.