I have a table in which you'll find the document_handler, kind_of_document, r_creation_date and document_state. The table has been prepared in advance based on kind_of_document "AVE", so this column can be ignored. Right now I need to make a script that counts how many documents each handler has on hold and on to do. The script below manages to do that, however, I still need to find the oldest r_creation_date for each document state for each handler. That is something else...
The table:
document_handler | kind_of_document | r_creation_date | document_state |
Name1 | AVE | 3/03/2016 | HOLD |
Name1 | AVE | 29/03/2016 | HOLD |
Name1 | AVE | 23/03/2016 | HOLD |
Name1 | AVE | 23/02/2016 | HOLD |
Name1 | AVE | 9/03/2016 | HOLD |
Name2 | AVE | 1/03/2016 | HOLD |
Name2 | AVE | 25/08/2015 | HOLD |
Name2 | AVE | 27/02/2016 | HOLD |
Name2 | AVE | 1/03/2016 | HOLD |
Name2 | AVE | 22/02/2016 | HOLD |
Name2 | AVE | 22/03/2016 | HOLD |
Name2 | AVE | 29/01/2016 | HOLD |
Name2 | AVE | 22/03/2016 | HOLD |
Name2 | AVE | 22/03/2016 | HOLD |
Name2 | AVE | 20/02/2016 | HOLD |
Name2 | AVE | 7/03/2016 | HOLD |
Name2 | AVE | 14/03/2016 | HOLD |
Name2 | AVE | 2/03/2016 | HOLD |
Name2 | AVE | 26/01/2016 | HOLD |
Name2 | AVE | 1/03/2016 | HOLD |
Name2 | AVE | 19/02/2016 | HOLD |
Name2 | AVE | 29/03/2016 | TODO |
Name2 | AVE | 29/03/2016 | TODO |
Name2 | AVE | 31/03/2016 | TODO |
Name2 | AVE | 31/03/2016 | TODO |
Name2 | AVE | 30/03/2016 | TODO |
Name3 | AVE | 14/01/2016 | HOLD |
Name3 | AVE | 1/03/2016 | HOLD |
Name3 | AVE | 8/02/2016 | HOLD |
Name3 | AVE | 17/02/2016 | HOLD |
Name3 | AVE | 29/01/2016 | HOLD |
Name3 | AVE | 2/02/2016 | HOLD |
Name3 | AVE | 24/02/2016 | HOLD |
Name3 | AVE | 23/02/2016 | HOLD |
Name3 | AVE | 29/02/2016 | TODO |
Name3 | AVE | 15/03/2016 | TODO |
The code:
data want;
set have;
by document_handler;
retain
hold
todo
;
if first.document_handler
then do;
hold=0;
todo=0;
end;
select (document_state);
when ('HOLD') hold+1;
when ('TODO') todo+1;
otherwise ignore+1;
end;
if last.document_handler then output;
keep document_handler hold todo;
run;
How can I tell SAS that it has to take the "min(r_creation_date)" for each document_state for each document_handler? Perhaps this doesn't require a datastep, but just a proc sql?
As noted, there is no test data, so the code is untested. To get tested code, post test data in the form of a datastep.
data want (keep=document_handler hold todo); set have; by document_handler; retain hold todo first_date1 first_date2; if first.document_handler then do; hold=0; todo=0; first_date1=.; first_date2=.; end; select (document_state); when ('HOLD') do; hold+1; first_date1=ifn(first_date1=. or r_creation_date < first_date1,r_creation_date,first_date1); end; when ('TODO') do; todo+1; first_date2=ifn(first_date2=. or r_creation_date < first_date2,r_creation_date,first_date2); end; otherwise ignore+1; end; if last.document_handler then output; run;
Just add another retained value, on first obs set to date, then if any subsequent is less then set them:
data want (keep document_handler hold todo); set have; by document_handler; retain hold todo first_date; if first.document_handler then do; hold=0; todo=0; first_date=r_creation_date; end; select (document_state); when ('HOLD') hold+1; when ('TODO') todo+1; otherwise ignore+1; end; if r_creation_date < first_date then first_date=r_creation_date; if last.document_handler then output; run;
The result I get, is the same I got before.
document_handler | hold | todo |
Name1 | 5 | 3 |
Name2 | 11 | 6 |
Name3 | 2 | 9 |
The result I should get, requires two dates (one for each state). For example:
document_handler | hold | date1 | todo | date2 |
Name1 | 5 | 31/03/2016 | 3 | 27/03/2016 |
Name2 | 11 | 28/03/2016 | 6 | 27/03/2016 |
Name3 | 2 | 5/03/2016 | 9 | 15/03/2016 |
Date1 should be the oldest document in HOLD for each handler (Name1, Name2, Name3,...).
Date2 should be the oldest document in TODO for each handler (Name1, Name2, Name3,...).
Suggestions?
Have two variables, and do the calculation in the select block then, sorry, I ddin't see that requirement first time. Can't test this as no test data (datastep with example data):
data want (keep document_handler hold todo); set have; by document_handler; retain hold todo first_date1 first_date2; if first.document_handler then do; hold=0; todo=0; first_date1=.; first_date2=.; end; select (document_state); when ('HOLD') do; hold+1; ifn(first_date1=. or r_creation_date < first_date1,r_creation_date,first_date1); end; when ('TODO') do; todo+1; ifn(first_date2=. or r_creation_date < first_date2,r_creation_date,first_date2); end; otherwise ignore+1; end; if last.document_handler then output; run;
I used your code and I received a few errors. The first one was the absense of the "=" in the keep statement. After correcting that, I still receive some "ifn" errors, which I am not familiar with.
I am using the exact same code as posted above.
Log:
1254 data test1 (keep = document_handler hold todo);
1255 set ave_1;
1256 by document_handler;
1257 retain hold todo first_date1 first_date2;
1258 if first.document_handler then do;
1259 hold=0;
1260 todo=0;
1261 first_date1=.;
1262 first_date2=.;
1263 end;
1264 select (document_state);
1265 when ('HOLD') do;
1266 hold+1;
1267 ifn(first_date1=. or r_creation_date < first_date1,r_creation_date,first_date1);
-
22
76
ERROR: Undeclared array referenced: ifn.
ERROR 22-322: Syntax error, expecting one of the following: +, =.
ERROR 76-322: Syntax error, statement will be ignored.
1268 end;
1269 when ('TODO') do;
1270 todo+1;
1271 ifn(first_date2=. or r_creation_date < first_date2,r_creation_date,first_date2);
-
22
76
ERROR: Undeclared array referenced: ifn.
ERROR 22-322: Syntax error, expecting one of the following: +, =.
ERROR 76-322: Syntax error, statement will be ignored.
1272 end;
1273 otherwise ignore+1;
1274 end;
1275 if last.document_handler then output;
1276 run;
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
1267:28 1271:28
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TEST1 may be incomplete. When this step was stopped there were 0
observations and 3 variables.
WARNING: Data set WORK.TEST1 was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.04 seconds
As noted, there is no test data, so the code is untested. To get tested code, post test data in the form of a datastep.
data want (keep=document_handler hold todo); set have; by document_handler; retain hold todo first_date1 first_date2; if first.document_handler then do; hold=0; todo=0; first_date1=.; first_date2=.; end; select (document_state); when ('HOLD') do; hold+1; first_date1=ifn(first_date1=. or r_creation_date < first_date1,r_creation_date,first_date1); end; when ('TODO') do; todo+1; first_date2=ifn(first_date2=. or r_creation_date < first_date2,r_creation_date,first_date2); end; otherwise ignore+1; end; if last.document_handler then output; run;
This script doesn't give me errors, however, it is still missing the dates and that's due to a format issue.
I have a CSV-file containing this information and much more.
In the CSV-file you will find two columns with dates, namely date_treated and r_creation_date. Both dates are in the format of "dd/mm/yyyy hh:mm". I used "format datetime16." and "informat "anydtdtm16." on both columns, but the result in both columns differs. The date_treated is imported correctly according to the format "dd/mm/yyyy hh:mm" and the r_creation_date suddenly used the format "ddmmmyyyy". Since I am using the Belgian Period region and language settings, that becomes 01mrt2016 for the first of March, 2016. If the import were to do what actually supposed to do, this problem wouldn't be happening at all, I figure.
What could be a cause to this import issue?
data test;
input document_handler $ kind_of_document $ r_creation_date ddmmyy10. document_state $;
datalines;
Name1 AVE 3/03/2016 HOLD
Name1 AVE 29/03/2016 HOLD
Name1 AVE 23/03/2016 HOLD
Name1 AVE 23/02/2016 HOLD
Name1 AVE 9/03/2016 HOLD
Name2 AVE 1/03/2016 HOLD
Name2 AVE 25/08/2015 HOLD
Name2 AVE 27/02/2016 HOLD
Name2 AVE 1/03/2016 HOLD
Name2 AVE 22/02/2016 HOLD
Name2 AVE 22/03/2016 HOLD
Name2 AVE 29/01/2016 HOLD
Name2 AVE 22/03/2016 HOLD
Name2 AVE 22/03/2016 HOLD
Name2 AVE 20/02/2016 HOLD
Name2 AVE 7/03/2016 HOLD
Name2 AVE 14/03/2016 HOLD
Name2 AVE 2/03/2016 HOLD
Name2 AVE 26/01/2016 HOLD
Name2 AVE 1/03/2016 HOLD
Name2 AVE 19/02/2016 HOLD
Name2 AVE 29/03/2016 TODO
Name2 AVE 29/03/2016 TODO
Name2 AVE 31/03/2016 TODO
Name2 AVE 31/03/2016 TODO
Name2 AVE 30/03/2016 TODO
Name3 AVE 14/01/2016 HOLD
Name3 AVE 1/03/2016 HOLD
Name3 AVE 8/02/2016 HOLD
Name3 AVE 17/02/2016 HOLD
Name3 AVE 29/01/2016 HOLD
Name3 AVE 2/02/2016 HOLD
Name3 AVE 24/02/2016 HOLD
Name3 AVE 23/02/2016 HOLD
Name3 AVE 29/02/2016 TODO
Name3 AVE 15/03/2016 TODO
;
run;
data test;
set test;
format r_creation_date ddmmyy10.;
run;
PROC SQL;
CREATE TABLE test1 AS
select document_handler, kind_of_document, document_state, count(*) as count_documents, min(r_creation_date) as oldest_r_creation_date format ddmmyy10.
from test a
group by document_handler,kind_of_document, document_state
;
QUIT;
- Cheers -
I managed to fix it by deleting the table and importing it again. Now it r_creation_date has the format datetime16. and it is in t he region language US. (Example: 20OCT2015 16:54)
The only problem now is that your original code keeps outputting only three variables (document_handler hold todo). Even if I add "keep document_handler hold first_date1 do first_date2;" in after output and before run, it still only outputs only three variables?
Soilved by adding them to your keep stateming at the start of the code. Now I need to maintain my format in the datastep, cause it changed it to a numerical date. I guess that would be first_date1 format=datetime16. in de set statement of the code?
Any idea as to where to give the date variables the format datetime16.? I can't seem to manage that?
In the data step, I usually place at last place - before run or first, right after Set.
You our have a date time variable
Format variable_name datetime16.;
Or if you want it to display as a date
Format variable_name dtdate9.;
I have a huge CSV-file which I import as a whole to my unload library in SAS. In the import script I use format datetime16. and I use informat anydtdtm16. for every column containing dates.
If i check the table in the unload library, the CSV-file is imported correctly and the dates are in the format assigned above.
Next up I will use proc sql to filter out the data I don't need and to create a clean table only containing the information needed for this assignement. During this proc sql I also assign a format datetime16. to r_creation_date and when I check the clean table, the dates are in the formats required.
As soon as I use your part of the script to do the calculations, the dates end up in a numerical format.
Below you will find the entire code. The only thing I need to manage, is to maintain dates and not a numerical variable.
/* Collect all AVE documents in state HOLD and TODO */
proc sql;
create table ave_1 as
select document_handler, kind_of_document, r_creation_date format=datetime16., document_state
from unload.document
where kind_of_document="AVE"
and document_state in ("HOLD","TODO")
;
quit;
/* Sort on document_state */
proc sort in=ave_1 out=ave_1;
by document_state;
run;
/* Sort on document_handler. */
proc sort in=ave_1 out=ave_1;
by document_handler;
run;
/* Count all documents per state per handler and register oldest date per state per handler */
data ave_2 (keep=document_handler hold first_date1 todo first_date2);
set ave_1;
by document_handler;
retain hold first_date1 todo first_date2;
if first.document_handler then do;
hold=0;
first_date1=.;
todo=0;
first_date2=.;
end;
select (document_state);
when ('HOLD') do;
hold+1;
first_date1=ifn(first_date1=. or r_creation_date < first_date1,r_creation_date,first_date1);
end;
when ('TODO') do;
todo+1;
first_date2=ifn(first_date2=. or r_creation_date < first_date2,r_creation_date,first_date2);
end;
otherwise ignore+1;
end;
if last.document_handler then output;
keep document_handler hold first_date1 todo first_date2;
run;
That did the trick. Thanks again RW9 and Reeza!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.