Hello everyone,
I'm trying to automatize a simple yet time-consuming task.
Everyday I have new data on interviews. I need to keep track of how many interviews are created each day and how many of these interviews are closed.
What I have (variables):
Created (date of creation, e.g. "30/01/2018")
closetim (date when interview is completed, e.g. "02/02/2018")
Now, what I'm trying to accomplish is to create a dataset with 4 columns:
1st column: which tells me the date of when the data was downloaded (e.g. 03/02/2018),
2nd column = "Created" date
3rd column = "count of interviews with date = "created" date
4th column = "count of observations that has been closed (closetim not equal to missing) where date = "created" date"
Then, each day I can run the code and append new counts of interviews created and closed to an access database. I have tried to visualise it in the attached screendumps:
Any help is much appreciated,
Kind regards, Kristian
I'm using SAS Base 9.4 at work (student assistant), and sas university edition at home.
My code so far:
DATA have;
SET have;
DATA have;
SET have;
created = datepart(created);
closetim = datepart(closetim);
ATTRIB created FORMAT = ddmmyy8.;
ATTRIB closetim FORMAT = ddmmyy8.;
RUN;
PROC SQL;
CREATE TABLE newdata AS
SELECT Created, closetim
FROM have
ORDER BY created;
QUIT;
DATA newdata;
set newdata;
downloaddate = today();
ATTRIB downloaddate FORMAT = ddmmyy8.;
RUN;
DATA newdata;
SET newdata;
c_created = count(created);
RUN;
DATA newdata;
SET newdata;
if closetim ^= . then c_closetim = "closed";
RUN;
Not much wizardry required. A simple SQL should do the job. Below code uses your sample data.
DATA have;
LENGTH
created 8
closetim 8 ;
FORMAT
created DDMMYY10.
closetim DDMMYY10. ;
INFORMAT
created DDMMYY10.
closetim DDMMYY10. ;
INFILE 'C:\temp\data.csv'
firstobs=2
DLM=';'
truncover
DSD ;
INPUT
created
closetim ;
RUN;
proc sql feedback;
create table want as
select
today() as date_downloaded format=ddmmyy10.,
created,
count(created) as count_created,
count(closetim) as count_closed
from have
group by created
;
quit;
Ideally the downloaded date doesn't get populated using the today() function as that's the date when you execute the code and not necessarily when you've downloaded the data. If you've got the date of the data somewhere in the filename of the downloaded data or in the data itself then that's where you should retrieve the value for "downloaded date" from.
As you just need a simple SQL to aggregate your data I believe you could also do everything directly within MS Access (if you want to).
Although I don't completely follow your requirements, it looks like you need to summarize your data on "Created", and also on "Created", but with an indication that the interview is closed. PROC MEANS and PROC SQL are good choices to do this. The summarized results then need to be merged back with your original dataset. PROC SQL is perfect for this.
Because you posted a picture of your data, nobody can work with it. Take a look at the advice for new posters, and provide your data in a more usable format. Then people can try things, and make suggestions.
Tom
Thank you for your answer, Tom.
I'll try to restate my problem.
The data I have is just about similar to regular paneldata, i.e. I'm tracking how many interviews, created on a given day, have been closed over multiple days/months.
Each day I'm downloading the same data. I would expect that after a certain period, say a week, new interviews have been created and some amount of the interviews created in the past - that hasn't been closed - will be now be closed. This is essentially what I would like to create, as pictured in the attachement.
I followed your advise and have attached the data which I downloaded today. Hopefully, some SAS wizard is able to get me to the next step.
Kind regards,
Kristian
Where does data download date come from? Is the data fully refreshed each time or do you have to append the results from each month? I would guess it's a fully refresh each month?
Not much wizardry required. A simple SQL should do the job. Below code uses your sample data.
DATA have;
LENGTH
created 8
closetim 8 ;
FORMAT
created DDMMYY10.
closetim DDMMYY10. ;
INFORMAT
created DDMMYY10.
closetim DDMMYY10. ;
INFILE 'C:\temp\data.csv'
firstobs=2
DLM=';'
truncover
DSD ;
INPUT
created
closetim ;
RUN;
proc sql feedback;
create table want as
select
today() as date_downloaded format=ddmmyy10.,
created,
count(created) as count_created,
count(closetim) as count_closed
from have
group by created
;
quit;
Ideally the downloaded date doesn't get populated using the today() function as that's the date when you execute the code and not necessarily when you've downloaded the data. If you've got the date of the data somewhere in the filename of the downloaded data or in the data itself then that's where you should retrieve the value for "downloaded date" from.
As you just need a simple SQL to aggregate your data I believe you could also do everything directly within MS Access (if you want to).
Hi Patrick,
This is exactly what I was looking for. Thank you for your time 🙂
The data is downloaded each morning at 02:00 and the SAS-code will enter a setup that runs each morning at 05:00 so it should be okay.
I always try to design in a way that allows for reprocessing of data or for catch-up runs. Right now you might not see any reason for this... but you never know (i.e. you're sick for a few days and can't process your data).
For this reason I'd never use the execution date but I'd ensure that I've got the "data date" somewhere in the actual data or then in the filename itself, i.e. <filename_yyyymmdd.csv - something you could implement as part of your download process.
It’s not in the file name? Or maybe it’s in the file properties, or created date?
@kristiantdc wrote:
@Patrick
That’s an excellent point. I’m not sure whether the possibility of extracting such a date exist in the current setup (the data source comes from an external supplier and thus I’m limited to the variables they can ‘create’, as per say an ‘in-data date’ variable). I’ll definitely check with our supplier tomorrow. Thank you for the advice!
The metadata of the .csv file contains the creation date. But how I extract this "created" date of the file and put it into downloaded_date with SAS, I have absolutely no idea about.
Well, that part I can help with.
Tom
data DatasetDates;
length FileRef $8 OptionName $20 OptionVal $1024;
keep OptionName OptionVal;
/* Assign the fileref */
call missing(FileRef);
/* Blank, so SAS will assign a file name */
rc1=filename(FileRef, "/SomeDirectory/SomeFile.csv");
/* Associate the file name with the dataset or directory */
if rc1 ^=0 then
abort;
/* Open the entry for access by SAS. If it is a directory, zero will be returned */
EntryID=fopen(FileRef);
/* Get the number of options for the dataset */
OptNum=foptnum(EntryID);
if OptNum=. then
abort;
/* Get all of the options for the dataset */
do OptCount=1 to OptNum;
OptionName=foptname(EntryID, OptCount);
OptionVal=finfo(EntryID, OptionName);
if OptionName="Last Modified" then
output;
end;
/* Close the directory */
rc2=fclose(EntryID);
if rc2 ^=0 then
abort;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.