BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
skallamp
Obsidian | Level 7


Hi,

Can you please help me to split the file in to quarters according to filldat (column F in the attached).

For instance if the date is 01/25/2011 it sould be treated as Q1_2011 and if the date is 12/25/2010 it should be Q4_2010.

I have attached similar code for split to months, but not sure on how to do for Quarters.

Thanks,

+Sathyan

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

How about:

proc import datafile='d:\software\SampleFile.xls' out=have dbms=excel replace;
getnames=yes;run;
data have(drop=_f);
 set have(rename=(filldat=_f));
 filldat=input(_f,mmddyy10.);
 format filldat mmddyy10.;
run;
proc sort data=have;by filldat;run;
proc sql noprint;
 select quote(name) into : list separated by ','
  from dictionary.columns
   where libname='WORK' and memname='HAVE';
quit; 
data _null_;
declare hash ha(multidata:'Y');
 ha.definekey('filldat');
 ha.definedata(&list);
 ha.definedone();
 do until(last.filldat);
  set have;
  by filldat groupformat;
  format filldat yyq.;
  ha.add();
 end;
 ha.output(dataset: cats('Q',qtr(filldat),'_',year(filldat)));
run;



Ksharp

View solution in original post

15 REPLIES 15
manojinpec
Obsidian | Level 7

There is function tonget the quarter of year and then it can be concatenated with correspodning year.

shivas
Pyrite | Level 9

Hi,

Try this..

data want;
input date;
informat date mmddyy10.;
format date date9.;
year=year(date);
qtr=qtr(date);
DateValue=compress('Q'||qtr||'_'||year);
cards;
01/25/2011

12/25/2010

;
run;

Thanks,

Shiva

Ksharp
Super User

How about:

proc import datafile='d:\software\SampleFile.xls' out=have dbms=excel replace;
getnames=yes;run;
data have(drop=_f);
 set have(rename=(filldat=_f));
 filldat=input(_f,mmddyy10.);
 format filldat mmddyy10.;
run;
proc sort data=have;by filldat;run;
proc sql noprint;
 select quote(name) into : list separated by ','
  from dictionary.columns
   where libname='WORK' and memname='HAVE';
quit; 
data _null_;
declare hash ha(multidata:'Y');
 ha.definekey('filldat');
 ha.definedata(&list);
 ha.definedone();
 do until(last.filldat);
  set have;
  by filldat groupformat;
  format filldat yyq.;
  ha.add();
 end;
 ha.output(dataset: cats('Q',qtr(filldat),'_',year(filldat)));
run;



Ksharp

Patrick
Opal | Level 21

Or this way?


proc sql noprint;
  select distinct
    cats('want_',put(filldat,yyq6.)) ,
    cat(' ',"when('",put(filldat,yyq6.),"') output ",cats('want_',put(filldat,yyq6.),';'))
      into :Data_DsList separated by ' ' ,
           :Out_DsList separated by ' '
  from work.SampleFile
  ;
quit;


data &Data_DsList;
  set work.SampleFile;
  select(put(filldat,yyq6.));
    &Out_DsList
    otherwise;
  end;
run;

Ksharp
Super User

Patrick,

I want know, yours and mine ,which code is more efficient .

Smiley Happy

Ksharp

Patrick
Opal | Level 21

Mine of course! It uses less characters :smileylaugh:

...and more serious: I believe the Proc Sort to be quite costly.

Ksharp
Super User

No. I do not think so.

You used PUT() for every observations ,which is very low efficient.

I bet mine is faster than yours. Smiley Happy

Hope OP can give us some feedback.

Ksharp

Patrick
Opal | Level 21

You're writing the data 3 times to disk - even worse :smileyshocked:

Patrick
Opal | Level 21

O.K.... you win. Unless I feed less rows to the SQL with the put statements as done below.

But then: You could replace your Proc Sort with a SQL sort creating a view and I assume we would be more or less even.

Good to be reminded: I'm going to use functions next time.Smiley Happy

16         data work.SampleFile;
17           set work.SampleFile;
18           do _i=1 to 20;
19             output;
20           end;
21         run;

NOTE: There were 56720 observations read from the data set WORK.SAMPLEFILE.
NOTE: The data set WORK.SAMPLEFILE has 1134400 observations and 41 variables.
NOTE: DATA statement used (Total process time):
      real time           0.67 seconds
      user cpu time       0.23 seconds
      system cpu time     0.42 seconds
      memory              30629.37k
      OS Memory           37732.00k
      Timestamp           27/04/2012 08:19:20 PM
     

22        
23         data have;
24          set work.SampleFile;
25          format filldat mmddyy10.;
26         run;

NOTE: There were 1134400 observations read from the data set WORK.SAMPLEFILE.
NOTE: The data set WORK.HAVE has 1134400 observations and 41 variables.
NOTE: DATA statement used (Total process time):
      real time           1.07 seconds
      user cpu time       0.26 seconds
      system cpu time     0.78 seconds
      memory              30629.37k
      OS Memory           37732.00k
      Timestamp           27/04/2012 08:19:21 PM
     
2                                         The SAS System              20:18 Friday, April 27, 2012


27        
28         options fullstimer;
29         /* Patrick */
30         proc sql noprint;
31           create view vSampleFile as select distinct filldat from SampleFile;
NOTE: SQL view WORK.VSAMPLEFILE has been defined.
32           select distinct
33             cats('want_',put(filldat,yyq6.)) ,
34             cat(' ',"when('",put(filldat,yyq6.),"') output ",cats('want_',put(filldat,yyq6.),';
34       ! '))
35               into :Data_DsList separated by ' ' ,
36                    :Out_DsList separated by ' '
37           from work.vSampleFile
38           ;
39         quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.96 seconds
      user cpu time       0.56 seconds
      system cpu time     0.15 seconds
      memory              46209.40k
      OS Memory           53564.00k
      Timestamp           27/04/2012 08:19:22 PM
     

40        
41         data &Data_DsList;
42           set work.SampleFile;
43           select(put(filldat,yyq6.));
44             &Out_DsList
45             otherwise;
46           end;
47         run;

NOTE: There were 1134400 observations read from the data set WORK.SAMPLEFILE.
NOTE: The data set WORK.WANT_2011Q2 has 255600 observations and 41 variables.
NOTE: The data set WORK.WANT_2011Q3 has 283200 observations and 41 variables.
NOTE: The data set WORK.WANT_2011Q4 has 300000 observations and 41 variables.
NOTE: The data set WORK.WANT_2012Q1 has 295600 observations and 41 variables.
NOTE: DATA statement used (Total process time):
      real time           5.78 seconds
      user cpu time       0.53 seconds
      system cpu time     0.60 seconds
      memory              46209.40k
      OS Memory           53564.00k
      Timestamp           27/04/2012 08:19:28 PM
     

48        
49        
50         /* Ksharp */
51         proc sort data=have;by filldat;run;

3                                         The SAS System              20:18 Friday, April 27, 2012

NOTE: There were 1134400 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.HAVE has 1134400 observations and 41 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           9.96 seconds
      user cpu time       1.37 seconds
      system cpu time     1.70 seconds
      memory              268894.25k
      OS Memory           276588.00k
      Timestamp           27/04/2012 08:19:38 PM
     

52         proc sql noprint;
53          select quote(name) into : list separated by ','
54           from dictionary.columns
55            where libname='WORK' and memname='HAVE';
56         quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.05 seconds
      user cpu time       0.00 seconds
      system cpu time     0.00 seconds
      memory              268894.25k
      OS Memory           276588.00k
      Timestamp           27/04/2012 08:19:39 PM
     

57         data _null_;
58         declare hash ha(multidata:'Y');
59          ha.definekey('filldat');
60          ha.definedata(&list);
61          ha.definedone();
62          do until(last.filldat);
63           set have;
64           by filldat groupformat;
65           format filldat yyq.;
66           ha.add();
67          end;
68          ha.output(dataset: cats('Q',qtr(filldat),'_',year(filldat)));
69         run;

NOTE: The data set WORK.Q2_2011 has 255600 observations and 41 variables.
NOTE: The data set WORK.Q3_2011 has 283200 observations and 41 variables.
NOTE: The data set WORK.Q4_2011 has 300000 observations and 41 variables.
NOTE: The data set WORK.Q1_2012 has 295600 observations and 41 variables.
NOTE: There were 1134400 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
      real time           6.34 seconds
      user cpu time       1.65 seconds
      system cpu time     0.79 seconds
      memory              549040.51k
      OS Memory           557128.00k
      Timestamp           27/04/2012 08:19:46 PM
     

4                                         The SAS System              20:18 Friday, April 27, 2012

70        
71        
72         GOPTIONS NOACCESSIBLE;
73         %LET _CLIENTTASKLABEL=;
74         %LET _CLIENTPROJECTPATH=;
75         %LET _CLIENTPROJECTNAME=;
76         %LET _SASPROGRAMFILE=;
77        
78         ;*';*";*/;quit;run;
79         ODS _ALL_ CLOSE;
80        
81        
82         QUIT; RUN;
83        

Ksharp
Super User

Hi.Patrick,

I guess OP's dataset might have already been sorted before.

My ' proc sort ' maybe doesn't need  any more.

Just Guess. Smiley Happy

Ksharp

Peter_C
Rhodochrosite | Level 12

if you want high effficiency, collect the data in memory - either array or hash table indexed by put( filldat, qtr_format. )

and write the array at end-of-file.

For an array you would need to know the range (min/max dates) but a hash table solution would need only the single pass. (For an array solution, you could presume "outer" bounds like 1970-q1 to 2100-q1 without taking a lot of memory, but of course, the lower bound might need to be much earlier for data like date-of-birth).

Another high-efficiency alternate solution writes to a separate table for each distinct quarter, in one data step (needing no sort as there will only be a relatively small number of quarters).

Patrick
Opal | Level 21

Thanks Peter.

This little "contest" between Ksharp and I wasn't really serious. My code was never designed for performance and I don't believe that for the OP's real problem performance is importants. I liked the discussion with Ksharp - and even when not really serious there is always something to learn and take out of it.

PGStats
Opal | Level 21

Patrick, Ksharp, your skilled discussion is very stimulating. There is a lot to be learned in trying to sort out such performance issues. It set me doing my own "testing" along the same approach as Patrick's. I compared a simple SQL solution inspired by your comments with the Hash-based program proposed by Ksharp. The SQL queries go as follows:

proc sql noprint;

create table qtr as select fillDate,

cats('Q',qtr(fillDate),'_',year(fillDate)) as qtr length=7

from (select distinct fillDate from have)

order by fillDate;

select distinct cat("create table ", qtr,

" as select * from have natural join qtr where qtr=", quote(qtr))

into :sqlCmd separated by ';'

from qtr;

&sqlCmd.;

drop table qtr;

quit;

My tests seem to indicate that when the data is already sorted by date, the two approaches take about the same time. But when the order is random, the hash-based approach takes about twice as long. In both cases (sorted or not), hash takes more than 10 times the memory that SQL does. The log is attached.

PG

PG
Patrick
Opal | Level 21

I wouldn't have expected a pure SQL solution to perform that good. Interesting.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 8300 views
  • 2 likes
  • 7 in conversation