SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Summary data into new variables and proc append

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

Summary data into new variables and proc append

Hello everyone,

 

I'm trying to automatize a simple yet time-consuming task. 

 

Everyday I have new data on interviews. I need to keep track of how many interviews are created each day and how many of these interviews are closed. 

 

What I have (variables):

Created (date of creation, e.g. "30/01/2018")

closetim (date when interview is completed, e.g. "02/02/2018")

 

Now, what I'm trying to accomplish is to create a dataset with 4 columns:

1st column: which tells me the date of when the data was downloaded (e.g. 03/02/2018),

2nd column = "Created" date

3rd column = "count of interviews with date =  "created" date

4th column = "count of observations that has been closed (closetim not equal to missing) where date = "created" date"

 

Then, each day I can run the code and append new counts of interviews created and closed to an access database. I have tried to visualise it in the attached screendumps: 

Screen Shot 2018-02-23 at 11.29.06.pngScreen Shot 2018-02-23 at 11.29.35.png

 

Any help is much appreciated,

Kind regards, Kristian 

 

I'm using SAS Base 9.4 at work (student assistant), and sas university edition at home. 

 

My code so far: 

 

DATA have;
SET have;
 

DATA have;
	SET have;
	created	= datepart(created);
	closetim	= datepart(closetim);

	ATTRIB created  FORMAT = ddmmyy8.;
	ATTRIB closetim FORMAT = ddmmyy8.;
RUN;

PROC SQL;
	CREATE TABLE newdata AS
			SELECT Created, closetim
		FROM have
		ORDER BY created;
QUIT;

DATA newdata;
	set newdata;
		downloaddate = today();
		ATTRIB downloaddate FORMAT = ddmmyy8.;
RUN;

DATA newdata; 
	SET newdata;
	c_created = count(created);
RUN;

DATA newdata; 
SET newdata; 
	if closetim ^= . then c_closetim = "closed";
RUN;

 


Accepted Solutions
Solution
‎02-24-2018 03:17 AM
Respected Advisor
Posts: 4,743

Re: Summary data into new variables and proc append

Posted in reply to kristiantdc

Not much wizardry required. A simple SQL should do the job. Below code uses your sample data.

DATA have;
    LENGTH
        created            8
        closetim           8 ;
    FORMAT
        created          DDMMYY10.
        closetim         DDMMYY10. ;
    INFORMAT
        created          DDMMYY10.
        closetim         DDMMYY10. ;
    INFILE 'C:\temp\data.csv'
        firstobs=2
        DLM=';'
        truncover
        DSD ;
    INPUT
        created 
        closetim ;
RUN;

proc sql feedback;
  create table want as
    select
      today() as date_downloaded format=ddmmyy10.,
      created,
      count(created) as count_created,
      count(closetim) as count_closed
    from have
    group by created
    ;
quit;

Ideally the downloaded date doesn't get populated using the today() function as that's the date when you execute the code and not necessarily when you've downloaded the data. If you've got the date of the data somewhere in the filename of the downloaded data or in the data itself then that's where you should retrieve the value for "downloaded date" from.

 

 

As you just need a simple SQL to aggregate your data I believe you could also do everything directly within MS Access (if you want to).

View solution in original post


All Replies
PROC Star
Posts: 1,317

Re: Summary data into new variables and proc append

Posted in reply to kristiantdc

Although I don't completely follow your requirements, it looks like you need to summarize your data on "Created", and also on "Created", but with an indication that the interview is closed. PROC MEANS and PROC SQL are good choices to do this. The summarized results then need to be merged back with your original dataset. PROC SQL is perfect for this.

 

Because you posted a picture of your data, nobody can work with it. Take a look at the advice for new posters, and provide your data in a more usable format. Then people can try things, and make suggestions.

 

Tom

Occasional Contributor
Posts: 6

Re: Summary data into new variables and proc append

Thank you for your answer, Tom.

 

I'll try to restate my problem.

 

The data I have is just about similar to regular paneldata, i.e. I'm tracking how many interviews, created on a given day, have been closed over multiple days/months. 

 

Each day I'm downloading the same data. I would expect that after a certain period, say a week, new interviews have been created and some amount of the interviews created in the past - that hasn't been closed - will be now be closed. This is essentially what I would like to create, as pictured in the attachement. 

 

I followed your advise and have attached the data which I downloaded today. Hopefully, some SAS wizard is able to get me to the next step. 

Kind regards, 
Kristian

 

 

 

Super User
Posts: 23,778

Re: Summary data into new variables and proc append

Posted in reply to kristiantdc

Where does data download date come from? Is the data fully refreshed each time or do you have to append the results from each month? I would guess it's a fully refresh each month?

Occasional Contributor
Posts: 6

Re: Summary data into new variables and proc append

It’s refreshed each night, so every morning new “Created” dates are found and some interviews will be closed. So yes, fully refreshed.
Solution
‎02-24-2018 03:17 AM
Respected Advisor
Posts: 4,743

Re: Summary data into new variables and proc append

Posted in reply to kristiantdc

Not much wizardry required. A simple SQL should do the job. Below code uses your sample data.

DATA have;
    LENGTH
        created            8
        closetim           8 ;
    FORMAT
        created          DDMMYY10.
        closetim         DDMMYY10. ;
    INFORMAT
        created          DDMMYY10.
        closetim         DDMMYY10. ;
    INFILE 'C:\temp\data.csv'
        firstobs=2
        DLM=';'
        truncover
        DSD ;
    INPUT
        created 
        closetim ;
RUN;

proc sql feedback;
  create table want as
    select
      today() as date_downloaded format=ddmmyy10.,
      created,
      count(created) as count_created,
      count(closetim) as count_closed
    from have
    group by created
    ;
quit;

Ideally the downloaded date doesn't get populated using the today() function as that's the date when you execute the code and not necessarily when you've downloaded the data. If you've got the date of the data somewhere in the filename of the downloaded data or in the data itself then that's where you should retrieve the value for "downloaded date" from.

 

 

As you just need a simple SQL to aggregate your data I believe you could also do everything directly within MS Access (if you want to).

Occasional Contributor
Posts: 6

Re: Summary data into new variables and proc append

Hi Patrick,

 

This is exactly what I was looking for. Thank you for your time :-)

 

The data is downloaded each morning at 02:00 and the SAS-code will enter a setup that runs each morning at 05:00 so it should be okay. 

Respected Advisor
Posts: 4,743

Re: Summary data into new variables and proc append

[ Edited ]
Posted in reply to kristiantdc

@kristiantdc

I always try to design in a way that allows for reprocessing of data or for catch-up runs. Right now you might not see any reason for this... but you never know (i.e. you're sick for a few days and can't process your data).

 

For this reason I'd never use the execution date but I'd ensure that I've got the "data date" somewhere in the actual data or then in the filename itself, i.e. <filename_yyyymmdd.csv - something you could implement as part of your download process.

Occasional Contributor
Posts: 6

Re: Summary data into new variables and proc append

Posted in reply to kristiantdc
@Patrick

That’s an excellent point. I’m not sure whether the possibility of extracting such a date exist in the current setup (the data source comes from an external supplier and thus I’m limited to the variables they can ‘create’, as per say an ‘in-data date’ variable). I’ll definitely check with our supplier tomorrow. Thank you for the advice!
Super User
Posts: 23,778

Re: Summary data into new variables and proc append

Posted in reply to kristiantdc

It’s not in the file name? Or maybe it’s in the file properties, or created date?


kristiantdc wrote:
@Patrick

That’s an excellent point. I’m not sure whether the possibility of extracting such a date exist in the current setup (the data source comes from an external supplier and thus I’m limited to the variables they can ‘create’, as per say an ‘in-data date’ variable). I’ll definitely check with our supplier tomorrow. Thank you for the advice!

 

Occasional Contributor
Posts: 6

Re: Summary data into new variables and proc append

The metadata of the .csv file contains the creation date. But how I extract this "created" date of the file and put it into downloaded_date with SAS, I have absolutely no idea about. 

Highlighted
PROC Star
Posts: 1,317

Re: Summary data into new variables and proc append

Posted in reply to kristiantdc

Well, that part I can help with.

 

Tom

 

data DatasetDates;
	length FileRef $8 OptionName $20 OptionVal $1024;
	keep OptionName OptionVal;

	/* Assign the fileref */
	call missing(FileRef);

	/* Blank, so SAS will assign a file name */
	rc1=filename(FileRef, "/SomeDirectory/SomeFile.csv");

	/* Associate the file name with the dataset or directory */
	if rc1 ^=0 then
		abort;

	/* Open the entry for access by SAS. If it is a directory, zero will be returned */
	EntryID=fopen(FileRef);

	/* Get the number of options for the dataset */
	OptNum=foptnum(EntryID);

	if OptNum=. then
		abort;

	/* Get all of the options for the dataset */
	do OptCount=1 to OptNum;
		OptionName=foptname(EntryID, OptCount);
		OptionVal=finfo(EntryID, OptionName);

		if OptionName="Last Modified" then
			output;
	end;

	/* Close the directory */
	rc2=fclose(EntryID);

	if rc2 ^=0 then
		abort;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 660 views
  • 3 likes
  • 4 in conversation