Removing dates that fall outside of a range that is specified in another dataset

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

Removing dates that fall outside of a range that is specified in another dataset

Hi;

 

I've been asked to write some code which will deal with date variables and date ranges within a dataset (in SAS 9,3.)

 

The specification details that the code should be able to handle an unknown number of date variables (in ddmmyy10. format) , with unspecified names.  It should also remove any dates in a variable if they fall outside of a specified range...... if a range is specified that is.

 

My thought was to create a csv parameter file that could be imported in, specifying the names of any variables, and a start and end date if needed.

 

So, i  imported the file:

data test;

infile date dsd dlm='2c'x;

informat name $32. start ddmmyy10. stop ddmmyy10.;

input name start stop;

format start stop ddmmyy10.;

run;

(eg of file contents:

one_date          01/01/1990     12/12/1999 

another_date           .             .       

)

 

then created macro variables for each column:

proc sql;

select name into: date_vars separated by " " from test;

select start into: start separated by " " from test;

select stop into: stop separated by " " from test;

quit;

from here, my first thought was a simple if then statement:

 

data want;

set have;

if &start ne . then if date_vars le &start then &date_vars = .;

run;

 

Which falls over with

NOTE: Line generated by the macro variable "START".

1 01/01/1990 .

-

22

 

as well as syntax errors.

 

I also tried arrays, which ended up with similar problems as well as "too many subscripts....." messages.:

array datevars{*} &date_vars ;

array startdates{*} &start ;

do i = 1 to dim(datevars);

do j = 1 to dim2(startdates);

if startdates{j} ne '.' then

if (datevars{i} le startdates{j}) then datevars{i} = ' ';

end;

run;

 

I realise I must have made some very rookie errors, but any pointers as to where I am going wrong (probably all of it...  Smiley Surprised) )or any other help to accomplish this would be greatly appreciated!  Smiley Surprised)

 

 

Many Thanks in advance

 


Accepted Solutions
Solution
‎02-26-2016 05:33 AM
Esteemed Advisor
Esteemed Advisor
Posts: 7,203

Re: Removing dates that fall outside of a range that is specified in another dataset

Yes, sorry, the "..."d only takes date9. format e.g. 01JAN2014.  Change the step to:

 

data _null_;
  set test;
  call execute('data <your_dataset>; set <your_dataset>; if "'||put(input(start,ddmmyy10.),date9.)||'"d <= date <= "'||put(input(end,ddmmyy10.),date9.)||'"d then output;');
run;

Or you could actually just use the numeric value:

data _null_;
  set test;
  call execute('data <your_dataset>; set <your_dataset>; if '||input(start,ddmmyy10.)||' <= date <= '||input(end,ddmmyy10.)||' then output;');
run;

 

View solution in original post


All Replies
Esteemed Advisor
Posts: 5,198

Re: Removing dates that fall outside of a range that is specified in another dataset

There are many ways to do this. I suspect that you will be served a variety of solutions.

How many variables (approx) can there be?

I would probably read the csv and generate a macro variable, containing a complete where-statement. So the most logic will in that step, and then just refer to the macro variable in a PROC SQl delete from, or a data step.

Data never sleeps
Occasional Contributor
Posts: 9

Re: Removing dates that fall outside of a range that is specified in another dataset

Many thanks for the reply..

 

There could be almost any amount of variables - 20ish would be average..

 

How would you generate a macro variable, containing a complete where-statement? My marcro variable knowledge is confined to creating simple lists..  Smiley Happy

 

Thanks again

Esteemed Advisor
Esteemed Advisor
Posts: 7,203

Re: Removing dates that fall outside of a range that is specified in another dataset

Hi,

 

I am thinking you will need to be more specific.  You say there can be X number of date columns in a dataset, what happens if those date ranges are overlapping, i.e. some fit the criteria, others don't?  Post some test data, in the form of a datastep, and what you want the ouput to look like.  At a guess, I would say from this step, you have the means to generate the code directly:

data test;
  infile date dsd dlm='2c'x;
  informat name $32. start ddmmyy10. stop ddmmyy10.;
  input name start stop;
  format start stop ddmmyy10.;
run;

So you could then say something like (and not tested as no test data):

data _null_;
  set test;
  call execute('data <your_dataset>; set <your_dataset>; if "'||put(start,ddmmyy10.)||'"d <= date <= "'||put(end,ddmmyy10.)||'"d then output;');
run;

This will create a SAS datastep code for each variable (you could do it all in one if you like, just kept it like this for brevity) which does an if statement and only outputs data if date is between that.  This is iterative, so each variable top to bottom drops records from the selection before it.

Occasional Contributor
Posts: 9

Re: Removing dates that fall outside of a range that is specified in another dataset

Thankyou the replies..

 


 


RW9 wrote:

Hi,

 

I am thinking you will need to be more specific.  You say there can be X number of date columns in a dataset, what happens if those date ranges are overlapping, i.e. some fit the criteria, others don't?

 

 


 


This is the crux of the problem..The code is required to be generic, it cant refence any range or variable directly (as they may not exist at the time the code is run). So created the parameter file as above, which could contain any changing information.

 

For example,  if I had..

 

Have:

Name    one_date             Antother_date

AB          08/06/2007         12/05/1977

BB           12/05/1996         15/05/1990

CB           17/08/1985         20/08/1985

DB          11/11/1991         14/11/1991

 

 

Then with the date ranges supplied in the parameter file,  hopefully I would get:
Want:

Name    one_date             Antother_date

AB          .                              12/05/1977

BB           12/05/1996         15/05/1990

CB           .                             20/08/1985

DB          11/11/1991         14/11/1991

 

 

Esteemed Advisor
Esteemed Advisor
Posts: 7,203

Re: Removing dates that fall outside of a range that is specified in another dataset

So did you try the call execute code I placed above?  What that does is takes the dataset imported from your parameter file, and generates a datastep with one if statement for each row of the parameter file.  It should give you what you want.

Occasional Contributor
Posts: 9

Re: Removing dates that fall outside of a range that is specified in another dataset

Hi,

I apologise if I've made some really simple errors here, but I am still relatively speaking quite new to SAS, and am currently out of my comfort zone (which Is good though...)

 

 I presume its down to the way that my data is set up, but when I run it I get:

ERROR 77-185: Invalid number conversion on "01/01/1990"d. and

 

ERROR: Invalid date/time/datetime constant "01/01/1990"d.

ERROR: Invalid date/time/datetime constant "31/12/1995"d.

ERROR 77-185: Invalid number conversion on "31/12/1995"d.

 

If I remove the "d"'s from the code, then it comes up with "variable date not initialised". If I change date in the code, to a variable in the dataset, then I get  a lot of "invalid numeric data" messages. All my test data are formatted to ddmmyy10.

 

I think I've overlooked something, just not sure what.. Smiley Happy

 

 

Solution
‎02-26-2016 05:33 AM
Esteemed Advisor
Esteemed Advisor
Posts: 7,203

Re: Removing dates that fall outside of a range that is specified in another dataset

Yes, sorry, the "..."d only takes date9. format e.g. 01JAN2014.  Change the step to:

 

data _null_;
  set test;
  call execute('data <your_dataset>; set <your_dataset>; if "'||put(input(start,ddmmyy10.),date9.)||'"d <= date <= "'||put(input(end,ddmmyy10.),date9.)||'"d then output;');
run;

Or you could actually just use the numeric value:

data _null_;
  set test;
  call execute('data <your_dataset>; set <your_dataset>; if '||input(start,ddmmyy10.)||' <= date <= '||input(end,ddmmyy10.)||' then output;');
run;

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 560 views
  • 0 likes
  • 3 in conversation