BookmarkSubscribeRSS Feed
blablabla
Calcite | Level 5

Dear fellow SAS-users,

This is very likely to be a dreadfully unintelligent question on my part with a (hopefully) trivial answer, but please bear with me as I am mainly used to other programming languages and failry new to SAS. Altough the actual problem is somewhat more elaborate, I can simplify and reduce it to the following: I have table t1 containing a very large list of individuals and the dates they were on holiday in the format [ID] [HOLIDAY_STARTDATE] [HOLIDAY_ENDDATE]. In table t2, I have a list of every date in a given year with a column of 0's, as in [DATE_DAY][ONHOLIDAY]. Now what I would like to do is 1) check for each individual in t1 and each dat in t2 whether or not the particular date is within the range of an individual's holiday and if so 2) set the corresponding [ONHOLIDAY] value to 1. The end product should look something like this:

[ID]      [DATE]          [ONHOLIDAY]

1          01/01/2014     0

1          02/01/2014     0

1          03/01/2014     1

1          04/01/2014     1

1          05/01/2014     0

...

ID     31/12/2014     0

Effectively creating 365/366 entries for each individual having a 1 for each day they were on holiday.

Now I did try the following (although quite amateuristic) in PROC SQL, given the "zero-vector" per day (t1) and individual holiday dates (t2):

proc sql;

     create table t3 as

           select distinct ID, DATE,

                case

                     when DATE between HOLIDAY_STARTDATE and HOLIDAY_ENDDATE then 1

                     else 0

                end

           as ONHOLIDAY

                from t1, t2

                     group by DATE having (ONHOLIDAY = max(ONHOLIDAY))

                           order by DATE, ONHOLIDAY;

quit;

This works fine for small numbers of individuals, however when attempting to run this against the large database I have, I get error 5430 (http://support.sas.com/kb/5/430.html) , essentially telling me my query was too complex (note that I cannot change the cfg-files on my company laptop so I am stuck with this error unless I can optimize the query, also compression did not help).

Now, I would like to know whether there is an alternative to this. I am used to fairly easily writing loops and do work on matrices for this kind of simple problem (i.e. MATLAB), but SAS necessitates a different way of thinking, hence my question. Any suggestions are more than welcome!

Thanks a lot in advance!

3 REPLIES 3
Ksharp
Super User

How big are these two tables ?

data t1;
input id start : ddmmyy10. end : ddmmyy10.;
format start  end  ddmmyy10.;
cards;
1 03/01/2014 04/01/2014
;
run;
data t2;
input id DATE : ddmmyy10.;
format  date ddmmyy10.;
cards;
1          01/01/2014     
1          02/01/2014     
1          03/01/2014    
1          04/01/2014    
1          05/01/2014   
;
run;

data key;
 set t1;
 do date=start to end;
  output;
 end;
 drop start end;
run;
data want;
 merge t2 key(in=inkey);
 by id date;
 ONHOLIDAY=ifn(inkey,1,0);
run;



Xia Keshan

blablabla
Calcite | Level 5

Hi,

I would say about 500-1000Mb per year of data for the holiday statistics (the day vector itself is likely to be very small, their "product" by contrast may become much larger).

Thanks for the reply!

-Alexander

Kurt_Bremser
Super User

How about this:

data have;

infile cards;

input id startdate enddate;

informat startdate enddate ddmmyy10.;

format startdate enddate ddmmyy10.;

cards;

1 10/01/2014 12/01/2014

1 15/06/2014 20/06/2014

2 01/07/2014 31/07/2014

;

run;

proc sort data=have;

by id startdate;

run;

data want (keep=id date onholiday);

set have;

by id;

retain date;

format date ddmmyy10.;

if first.id then date = '01jan2014'd;

onholiday = 0;

do until (date >= startdate);

  output;

  date = date + 1;

end;

onholiday = 1;

do date = startdate to enddate;

  output;

end;

if last.id

then do;

  onholiday = 0;

  do until (date >= '31dec2014'd + 1);

    output;

    date = date + 1;

  end;

end;

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 2474 views
  • 0 likes
  • 3 in conversation