Outputting data into seperate months

Sean_OConnor · Posted 02-20-2019 05:00 AM

Folks,

I'm working trying to make some code as a efficient as possible. Basically I will have master dataset which I'll need to put into smaller datasets (based on months) as I receive data. So when I receive data of circa 4million records I'll need to move those records into different datasets depending what date they have on them.

I've created some code but I'm running into problems. See below for what I've written.

*Amend the raw file you receive ;
data delivery_n ;
set data_of_interest;
time=catx('-',paydate,time_stamp);
date = input(paydate,yymmdd10.);  
format date date10.;
newdate = put(date,yymmn6.);
period='Period';
newdate1=catx('_',period,newdate);
 run;

*Write the distinct months based on variable created newdate1
proc sql noprint;
  select distinct compbl(newdate1) into :period separated by " "
  from month_t;
quit;
filename x temp;
data _null_;
  attrib period length=$6 ds length=$18;
  file x;
  i=1;
  do while(scan("&period",i," ") ne "");
    ds=scan("&period",i," ");
    period=scan(ds,-1,"_");
    put "if period=""" period +(-1) """ then output " ds ";";
    i+1;
  end;
run;

*Output your data into separate  monthly datasets 
%macro i(range_high,range_low);
	%local offset start_date end_date suffix;

	%do offset=0 %to %sysfunc(intck(month,&range_high,&range_low)) %by -1;
		%let start_date=%sysfunc(intnx(month,&range_high,&offset,b));
		%let end_date=%sysfunc(intnx(month,&start_date,0,e));
data &period ;
set month_t;
 %inc x
 if &start_date <= date <=&end_date then output i;
run;
%end;
%mend i;

%i('01dec2019'd,'01jan2019'd);

I'm getting the following issue;

WARNING: Source line spooling has been disabled. %INCLUDE of line numbers is not available.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 402033 observations read from the data set WORK.MONTH_T.
WARNING: The data set WORK.PERIOD_201901 may be incomplete.  When this step was stopped there were 402033 observations and 83 
         variables.
WARNING: The data set WORK.PERIOD_201902 may be incomplete.  When this step was stopped there were 402033 observations and 83 
         variables.
WARNING: The data set WORK.PERIOD_201903 may be incomplete.  When this step was stopped there were 402033 observations and 83 
         variables.
WARNING: The data set WORK.PERIOD_201904 may be incomplete.  When this step was stopped there were 402033 observations and 83 
         variables.
WARNING: The data set WORK.PERIOD_201912 may be incomplete.  When this step was stopped there were 402033 observations and 83 
         variables.

The datasets are not separating out but appear to be writing all observations to each separate dataset.

Any help would be great.

Sean_OConnor · Posted 02-20-2019 05:04 AM

I note that towards the end we should include output &period but the same issue occurs.

RW9 · Posted 02-20-2019 05:22 AM

I can tell nothing from that log snippet other than there was an error somewhere before it. If you really have to (and there has to be a very good reason = rare) split data up - as you can get big read/write impact on your code, it makes coding and maintenance harder etc., then something like:

proc sort data=have (rename=(period_start=period)) out=loop nodupkey;
  by period;
run;

data _null_;
  set loop;
  call execute(cats('data period_',period,'; set have; where period_start <= ',period,' < period_end; run;'));
run;

Note, I do not know your data, so just made the above up as an example. Basically this will create a distinct list of the output periods, then with this distinct list in the data null, one datastep per period is created and executed. Its a lot simpler than macro loops.

Sean_OConnor · Posted 02-20-2019 05:59 AM

Hi,

Just to further elaborate.

My dataset have as a sas data variable called (period) which ranges from 1JAN2019 to the 31DEC2019.

What I need to do is write some code that places observations into different datasets based on the period.

So if any observation falls between the 1JAN2019 to the 31DEC2019 it gets outputted to the January dataset.

If any observations falls between 1APR2019 to the 30APR2019 it gets outputted to the April dataset.

I need to split data up into monthly datasets as I have to create monthly aggregated datasets for individuals in my company to carry out analysis.

Hence my code to begin was trying to do this, but was running into difficulties.

RW9 · Posted 02-20-2019 07:10 AM

Again why do you need to? It is very simple to assign a grouping variable:

data want;
  set have;
  array group{12} 1;
  do i=1 to 12;
    if month(date) >= i then group{i}=1;
  end;
run;

You then have a set of group variables which you can perform procedures by, for example to run a proc means on all data from april:

proc means data=want;
  var result;
  where group{4}=1;
  output out=april n=n;
run;

You could of course put the grouping into one group, then do it across all by groups very simply.

This code minimises the read/writes - i.e. saving processor time, and saving disk space. Minimises the coding needed, all inbuilt simple Base SAS, and minimises upkeep.

Anyways, am sure you will go ahead anyways, so here is my example - note how I put some test data in a datastep:

data have;
  date='11JAN2019'd; result=12; output;
  date='12FEB2019'd; result=12; output;
run;

data have;
  set have;
  period=month(date);
run;

data _null_;
  do i=1 to 12;
    call execute(cats('data period_',put(i,best.),'; set have; where month(date) >= ',put(i,best.),'; run;'));
  end;
run;

This will create 12 datasets, each where month of date is greater than current iteration, so 2 records in period1, 1 in period2, 0 in all others. However do bear in mind that is 12 read/writes, + 12 header block storage, this code will run slower than doing the by group processing.

Astounding · Posted 02-20-2019 08:58 AM

It be the same mistake that we have all made.

%inc x

is missing a semicolon.

Outputting data into seperate months

Re: Outputting data into seperate months

Re: Outputting data into seperate months

Re: Outputting data into seperate months

Re: Outputting data into seperate months

Re: Outputting data into seperate months

Outputting data into seperate months

Re: Outputting data into seperate months

Re: Outputting data into seperate months

Re: Outputting data into seperate months

Re: Outputting data into seperate months

Re: Outputting data into seperate months

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away