BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kristiantdc
Calcite | Level 5

Hello everyone,

 

I'm trying to automatize a simple yet time-consuming task. 

 

Everyday I have new data on interviews. I need to keep track of how many interviews are created each day and how many of these interviews are closed. 

 

What I have (variables):

Created (date of creation, e.g. "30/01/2018")

closetim (date when interview is completed, e.g. "02/02/2018")

 

Now, what I'm trying to accomplish is to create a dataset with 4 columns:

1st column: which tells me the date of when the data was downloaded (e.g. 03/02/2018),

2nd column = "Created" date

3rd column = "count of interviews with date =  "created" date

4th column = "count of observations that has been closed (closetim not equal to missing) where date = "created" date"

 

Then, each day I can run the code and append new counts of interviews created and closed to an access database. I have tried to visualise it in the attached screendumps: 

Screen Shot 2018-02-23 at 11.29.06.pngScreen Shot 2018-02-23 at 11.29.35.png

 

Any help is much appreciated,

Kind regards, Kristian 

 

I'm using SAS Base 9.4 at work (student assistant), and sas university edition at home. 

 

My code so far: 

 

DATA have;
SET have;
 

DATA have;
	SET have;
	created	= datepart(created);
	closetim	= datepart(closetim);

	ATTRIB created  FORMAT = ddmmyy8.;
	ATTRIB closetim FORMAT = ddmmyy8.;
RUN;

PROC SQL;
	CREATE TABLE newdata AS
			SELECT Created, closetim
		FROM have
		ORDER BY created;
QUIT;

DATA newdata;
	set newdata;
		downloaddate = today();
		ATTRIB downloaddate FORMAT = ddmmyy8.;
RUN;

DATA newdata; 
	SET newdata;
	c_created = count(created);
RUN;

DATA newdata; 
SET newdata; 
	if closetim ^= . then c_closetim = "closed";
RUN;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Not much wizardry required. A simple SQL should do the job. Below code uses your sample data.

DATA have;
    LENGTH
        created            8
        closetim           8 ;
    FORMAT
        created          DDMMYY10.
        closetim         DDMMYY10. ;
    INFORMAT
        created          DDMMYY10.
        closetim         DDMMYY10. ;
    INFILE 'C:\temp\data.csv'
        firstobs=2
        DLM=';'
        truncover
        DSD ;
    INPUT
        created 
        closetim ;
RUN;

proc sql feedback;
  create table want as
    select
      today() as date_downloaded format=ddmmyy10.,
      created,
      count(created) as count_created,
      count(closetim) as count_closed
    from have
    group by created
    ;
quit;

Ideally the downloaded date doesn't get populated using the today() function as that's the date when you execute the code and not necessarily when you've downloaded the data. If you've got the date of the data somewhere in the filename of the downloaded data or in the data itself then that's where you should retrieve the value for "downloaded date" from.

 

 

As you just need a simple SQL to aggregate your data I believe you could also do everything directly within MS Access (if you want to).

View solution in original post

11 REPLIES 11
TomKari
Onyx | Level 15

Although I don't completely follow your requirements, it looks like you need to summarize your data on "Created", and also on "Created", but with an indication that the interview is closed. PROC MEANS and PROC SQL are good choices to do this. The summarized results then need to be merged back with your original dataset. PROC SQL is perfect for this.

 

Because you posted a picture of your data, nobody can work with it. Take a look at the advice for new posters, and provide your data in a more usable format. Then people can try things, and make suggestions.

 

Tom

kristiantdc
Calcite | Level 5

Thank you for your answer, Tom.

 

I'll try to restate my problem.

 

The data I have is just about similar to regular paneldata, i.e. I'm tracking how many interviews, created on a given day, have been closed over multiple days/months. 

 

Each day I'm downloading the same data. I would expect that after a certain period, say a week, new interviews have been created and some amount of the interviews created in the past - that hasn't been closed - will be now be closed. This is essentially what I would like to create, as pictured in the attachement. 

 

I followed your advise and have attached the data which I downloaded today. Hopefully, some SAS wizard is able to get me to the next step. 

Kind regards, 
Kristian

 

 

 

Reeza
Super User

Where does data download date come from? Is the data fully refreshed each time or do you have to append the results from each month? I would guess it's a fully refresh each month?

kristiantdc
Calcite | Level 5
It’s refreshed each night, so every morning new “Created” dates are found and some interviews will be closed. So yes, fully refreshed.
Patrick
Opal | Level 21

Not much wizardry required. A simple SQL should do the job. Below code uses your sample data.

DATA have;
    LENGTH
        created            8
        closetim           8 ;
    FORMAT
        created          DDMMYY10.
        closetim         DDMMYY10. ;
    INFORMAT
        created          DDMMYY10.
        closetim         DDMMYY10. ;
    INFILE 'C:\temp\data.csv'
        firstobs=2
        DLM=';'
        truncover
        DSD ;
    INPUT
        created 
        closetim ;
RUN;

proc sql feedback;
  create table want as
    select
      today() as date_downloaded format=ddmmyy10.,
      created,
      count(created) as count_created,
      count(closetim) as count_closed
    from have
    group by created
    ;
quit;

Ideally the downloaded date doesn't get populated using the today() function as that's the date when you execute the code and not necessarily when you've downloaded the data. If you've got the date of the data somewhere in the filename of the downloaded data or in the data itself then that's where you should retrieve the value for "downloaded date" from.

 

 

As you just need a simple SQL to aggregate your data I believe you could also do everything directly within MS Access (if you want to).

kristiantdc
Calcite | Level 5

Hi Patrick,

 

This is exactly what I was looking for. Thank you for your time 🙂

 

The data is downloaded each morning at 02:00 and the SAS-code will enter a setup that runs each morning at 05:00 so it should be okay. 

Patrick
Opal | Level 21

@kristiantdc

I always try to design in a way that allows for reprocessing of data or for catch-up runs. Right now you might not see any reason for this... but you never know (i.e. you're sick for a few days and can't process your data).

 

For this reason I'd never use the execution date but I'd ensure that I've got the "data date" somewhere in the actual data or then in the filename itself, i.e. <filename_yyyymmdd.csv - something you could implement as part of your download process.

kristiantdc
Calcite | Level 5
@Patrick

That’s an excellent point. I’m not sure whether the possibility of extracting such a date exist in the current setup (the data source comes from an external supplier and thus I’m limited to the variables they can ‘create’, as per say an ‘in-data date’ variable). I’ll definitely check with our supplier tomorrow. Thank you for the advice!
Reeza
Super User

It’s not in the file name? Or maybe it’s in the file properties, or created date?


@kristiantdc wrote:
@Patrick

That’s an excellent point. I’m not sure whether the possibility of extracting such a date exist in the current setup (the data source comes from an external supplier and thus I’m limited to the variables they can ‘create’, as per say an ‘in-data date’ variable). I’ll definitely check with our supplier tomorrow. Thank you for the advice!

 

kristiantdc
Calcite | Level 5

The metadata of the .csv file contains the creation date. But how I extract this "created" date of the file and put it into downloaded_date with SAS, I have absolutely no idea about. 

TomKari
Onyx | Level 15

Well, that part I can help with.

 

Tom

 

data DatasetDates;
	length FileRef $8 OptionName $20 OptionVal $1024;
	keep OptionName OptionVal;

	/* Assign the fileref */
	call missing(FileRef);

	/* Blank, so SAS will assign a file name */
	rc1=filename(FileRef, "/SomeDirectory/SomeFile.csv");

	/* Associate the file name with the dataset or directory */
	if rc1 ^=0 then
		abort;

	/* Open the entry for access by SAS. If it is a directory, zero will be returned */
	EntryID=fopen(FileRef);

	/* Get the number of options for the dataset */
	OptNum=foptnum(EntryID);

	if OptNum=. then
		abort;

	/* Get all of the options for the dataset */
	do OptCount=1 to OptNum;
		OptionName=foptname(EntryID, OptCount);
		OptionVal=finfo(EntryID, OptionName);

		if OptionName="Last Modified" then
			output;
	end;

	/* Close the directory */
	rc2=fclose(EntryID);

	if rc2 ^=0 then
		abort;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 1909 views
  • 3 likes
  • 4 in conversation