DATA Step, Macro, Functions and more

Split data in to quarters

Accepted Solution Solved
Reply
Contributor
Posts: 32
Accepted Solution

Split data in to quarters


Hi,

Can you please help me to split the file in to quarters according to filldat (column F in the attached).

For instance if the date is 01/25/2011 it sould be treated as Q1_2011 and if the date is 12/25/2010 it should be Q4_2010.

I have attached similar code for split to months, but not sure on how to do for Quarters.

Thanks,

+Sathyan

Attachment

Accepted Solutions
Solution
‎04-27-2012 03:29 AM
Super User
Posts: 9,681

Re: Split data in to quarters

How about:

proc import datafile='d:\software\SampleFile.xls' out=have dbms=excel replace;
getnames=yes;run;
data have(drop=_f);
 set have(rename=(filldat=_f));
 filldat=input(_f,mmddyy10.);
 format filldat mmddyy10.;
run;
proc sort data=have;by filldat;run;
proc sql noprint;
 select quote(name) into : list separated by ','
  from dictionary.columns
   where libname='WORK' and memname='HAVE';
quit; 
data _null_;
declare hash ha(multidata:'Y');
 ha.definekey('filldat');
 ha.definedata(&list);
 ha.definedone();
 do until(last.filldat);
  set have;
  by filldat groupformat;
  format filldat yyq.;
  ha.add();
 end;
 ha.output(dataset: cats('Q',qtr(filldat),'_',year(filldat)));
run;



Ksharp

View solution in original post


All Replies
Frequent Contributor
Posts: 138

Re: Split data in to quarters

There is function tonget the quarter of year and then it can be concatenated with correspodning year.

Super Contributor
Posts: 349

Re: Split data in to quarters

Hi,

Try this..

data want;
input date;
informat date mmddyy10.;
format date date9.;
year=year(date);
qtr=qtr(date);
DateValue=compress('Q'||qtr||'_'||year);
cards;
01/25/2011

12/25/2010

;
run;

Thanks,

Shiva

Solution
‎04-27-2012 03:29 AM
Super User
Posts: 9,681

Re: Split data in to quarters

How about:

proc import datafile='d:\software\SampleFile.xls' out=have dbms=excel replace;
getnames=yes;run;
data have(drop=_f);
 set have(rename=(filldat=_f));
 filldat=input(_f,mmddyy10.);
 format filldat mmddyy10.;
run;
proc sort data=have;by filldat;run;
proc sql noprint;
 select quote(name) into : list separated by ','
  from dictionary.columns
   where libname='WORK' and memname='HAVE';
quit; 
data _null_;
declare hash ha(multidata:'Y');
 ha.definekey('filldat');
 ha.definedata(&list);
 ha.definedone();
 do until(last.filldat);
  set have;
  by filldat groupformat;
  format filldat yyq.;
  ha.add();
 end;
 ha.output(dataset: cats('Q',qtr(filldat),'_',year(filldat)));
run;



Ksharp

Respected Advisor
Posts: 3,892

Re: Split data in to quarters

Or this way?


proc sql noprint;
  select distinct
    cats('want_',put(filldat,yyq6.)) ,
    cat(' ',"when('",put(filldat,yyq6.),"') output ",cats('want_',put(filldat,yyq6.),';'))
      into Smiley Very Happyata_DsList separated by ' ' ,
           Smiley Surprisedut_DsList separated by ' '
  from work.SampleFile
  ;
quit;


data &Data_DsList;
  set work.SampleFile;
  select(put(filldat,yyq6.));
    &Out_DsList
    otherwise;
  end;
run;

Super User
Posts: 9,681

Re: Split data in to quarters

Patrick,

I want know, yours and mine ,which code is more efficient .

Smiley Happy

Ksharp

Respected Advisor
Posts: 3,892

Re: Split data in to quarters

Mine of course! It uses less characters :smileylaugh:

...and more serious: I believe the Proc Sort to be quite costly.

Super User
Posts: 9,681

Re: Split data in to quarters

No. I do not think so.

You used PUT() for every observations ,which is very low efficient.

I bet mine is faster than yours. Smiley Happy

Hope OP can give us some feedback.

Ksharp

Respected Advisor
Posts: 3,892

Re: Split data in to quarters

You're writing the data 3 times to disk - even worse :smileyshocked:

Respected Advisor
Posts: 3,892

Re: Split data in to quarters

O.K.... you win. Unless I feed less rows to the SQL with the put statements as done below.

But then: You could replace your Proc Sort with a SQL sort creating a view and I assume we would be more or less even.

Good to be reminded: I'm going to use functions next time.Smiley Happy

16         data work.SampleFile;
17           set work.SampleFile;
18           do _i=1 to 20;
19             output;
20           end;
21         run;

NOTE: There were 56720 observations read from the data set WORK.SAMPLEFILE.
NOTE: The data set WORK.SAMPLEFILE has 1134400 observations and 41 variables.
NOTE: DATA statement used (Total process time):
      real time           0.67 seconds
      user cpu time       0.23 seconds
      system cpu time     0.42 seconds
      memory              30629.37k
      OS Memory           37732.00k
      Timestamp           27/04/2012 08:19:20 PM
     

22        
23         data have;
24          set work.SampleFile;
25          format filldat mmddyy10.;
26         run;

NOTE: There were 1134400 observations read from the data set WORK.SAMPLEFILE.
NOTE: The data set WORK.HAVE has 1134400 observations and 41 variables.
NOTE: DATA statement used (Total process time):
      real time           1.07 seconds
      user cpu time       0.26 seconds
      system cpu time     0.78 seconds
      memory              30629.37k
      OS Memory           37732.00k
      Timestamp           27/04/2012 08:19:21 PM
     
2                                         The SAS System              20:18 Friday, April 27, 2012


27        
28         options fullstimer;
29         /* Patrick */
30         proc sql noprint;
31           create view vSampleFile as select distinct filldat from SampleFile;
NOTE: SQL view WORK.VSAMPLEFILE has been defined.
32           select distinct
33             cats('want_',put(filldat,yyq6.)) ,
34             cat(' ',"when('",put(filldat,yyq6.),"') output ",cats('want_',put(filldat,yyq6.),';
34       ! '))
35               into Smiley Very Happyata_DsList separated by ' ' ,
36                    Smiley Surprisedut_DsList separated by ' '
37           from work.vSampleFile
38           ;
39         quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.96 seconds
      user cpu time       0.56 seconds
      system cpu time     0.15 seconds
      memory              46209.40k
      OS Memory           53564.00k
      Timestamp           27/04/2012 08:19:22 PM
     

40        
41         data &Data_DsList;
42           set work.SampleFile;
43           select(put(filldat,yyq6.));
44             &Out_DsList
45             otherwise;
46           end;
47         run;

NOTE: There were 1134400 observations read from the data set WORK.SAMPLEFILE.
NOTE: The data set WORK.WANT_2011Q2 has 255600 observations and 41 variables.
NOTE: The data set WORK.WANT_2011Q3 has 283200 observations and 41 variables.
NOTE: The data set WORK.WANT_2011Q4 has 300000 observations and 41 variables.
NOTE: The data set WORK.WANT_2012Q1 has 295600 observations and 41 variables.
NOTE: DATA statement used (Total process time):
      real time           5.78 seconds
      user cpu time       0.53 seconds
      system cpu time     0.60 seconds
      memory              46209.40k
      OS Memory           53564.00k
      Timestamp           27/04/2012 08:19:28 PM
     

48        
49        
50         /* Ksharp */
51         proc sort data=have;by filldat;run;

3                                         The SAS System              20:18 Friday, April 27, 2012

NOTE: There were 1134400 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.HAVE has 1134400 observations and 41 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           9.96 seconds
      user cpu time       1.37 seconds
      system cpu time     1.70 seconds
      memory              268894.25k
      OS Memory           276588.00k
      Timestamp           27/04/2012 08:19:38 PM
     

52         proc sql noprint;
53          select quote(name) into : list separated by ','
54           from dictionary.columns
55            where libname='WORK' and memname='HAVE';
56         quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.05 seconds
      user cpu time       0.00 seconds
      system cpu time     0.00 seconds
      memory              268894.25k
      OS Memory           276588.00k
      Timestamp           27/04/2012 08:19:39 PM
     

57         data _null_;
58         declare hash ha(multidata:'Y');
59          ha.definekey('filldat');
60          ha.definedata(&list);
61          ha.definedone();
62          do until(last.filldat);
63           set have;
64           by filldat groupformat;
65           format filldat yyq.;
66           ha.add();
67          end;
68          ha.output(dataset: cats('Q',qtr(filldat),'_',year(filldat)));
69         run;

NOTE: The data set WORK.Q2_2011 has 255600 observations and 41 variables.
NOTE: The data set WORK.Q3_2011 has 283200 observations and 41 variables.
NOTE: The data set WORK.Q4_2011 has 300000 observations and 41 variables.
NOTE: The data set WORK.Q1_2012 has 295600 observations and 41 variables.
NOTE: There were 1134400 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
      real time           6.34 seconds
      user cpu time       1.65 seconds
      system cpu time     0.79 seconds
      memory              549040.51k
      OS Memory           557128.00k
      Timestamp           27/04/2012 08:19:46 PM
     

4                                         The SAS System              20:18 Friday, April 27, 2012

70        
71        
72         GOPTIONS NOACCESSIBLE;
73         %LET _CLIENTTASKLABEL=;
74         %LET _CLIENTPROJECTPATH=;
75         %LET _CLIENTPROJECTNAME=;
76         %LET _SASPROGRAMFILE=;
77        
78         ;*';*";*/;quit;run;
79         ODS _ALL_ CLOSE;
80        
81        
82         QUIT; RUN;
83        

Super User
Posts: 9,681

Re: Split data in to quarters

Hi.Patrick,

I guess OP's dataset might have already been sorted before.

My ' proc sort ' maybe doesn't need  any more.

Just Guess. Smiley Happy

Ksharp

Valued Guide
Posts: 2,175

Re: Split data in to quarters

if you want high effficiency, collect the data in memory - either array or hash table indexed by put( filldat, qtr_format. )

and write the array at end-of-file.

For an array you would need to know the range (min/max dates) but a hash table solution would need only the single pass. (For an array solution, you could presume "outer" bounds like 1970-q1 to 2100-q1 without taking a lot of memory, but of course, the lower bound might need to be much earlier for data like date-of-birth).

Another high-efficiency alternate solution writes to a separate table for each distinct quarter, in one data step (needing no sort as there will only be a relatively small number of quarters).

Respected Advisor
Posts: 3,892

Re: Split data in to quarters

Thanks Peter.

This little "contest" between Ksharp and I wasn't really serious. My code was never designed for performance and I don't believe that for the OP's real problem performance is importants. I liked the discussion with Ksharp - and even when not really serious there is always something to learn and take out of it.

Respected Advisor
Posts: 4,649

Re: Split data in to quarters

Patrick, Ksharp, your skilled discussion is very stimulating. There is a lot to be learned in trying to sort out such performance issues. It set me doing my own "testing" along the same approach as Patrick's. I compared a simple SQL solution inspired by your comments with the Hash-based program proposed by Ksharp. The SQL queries go as follows:

proc sql noprint;

create table qtr as select fillDate,

cats('Q',qtr(fillDate),'_',year(fillDate)) as qtr length=7

from (select distinct fillDate from have)

order by fillDate;

select distinct cat("create table ", qtr,

" as select * from have natural join qtr where qtr=", quote(qtr))

into :sqlCmd separated by ';'

from qtr;

&sqlCmd.;

drop table qtr;

quit;

My tests seem to indicate that when the data is already sorted by date, the two approaches take about the same time. But when the order is random, the hash-based approach takes about twice as long. In both cases (sorted or not), hash takes more than 10 times the memory that SQL does. The log is attached.

PG

PG
Attachment
Respected Advisor
Posts: 3,892

Re: Split data in to quarters

I wouldn't have expected a pure SQL solution to perform that good. Interesting.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 1163 views
  • 1 like
  • 7 in conversation