Hi,
Can you please help me to split the file in to quarters according to filldat (column F in the attached).
For instance if the date is 01/25/2011 it sould be treated as Q1_2011 and if the date is 12/25/2010 it should be Q4_2010.
I have attached similar code for split to months, but not sure on how to do for Quarters.
Thanks,
+Sathyan
How about:
proc import datafile='d:\software\SampleFile.xls' out=have dbms=excel replace; getnames=yes;run; data have(drop=_f); set have(rename=(filldat=_f)); filldat=input(_f,mmddyy10.); format filldat mmddyy10.; run; proc sort data=have;by filldat;run; proc sql noprint; select quote(name) into : list separated by ',' from dictionary.columns where libname='WORK' and memname='HAVE'; quit; data _null_; declare hash ha(multidata:'Y'); ha.definekey('filldat'); ha.definedata(&list); ha.definedone(); do until(last.filldat); set have; by filldat groupformat; format filldat yyq.; ha.add(); end; ha.output(dataset: cats('Q',qtr(filldat),'_',year(filldat))); run;
Ksharp
There is function tonget the quarter of year and then it can be concatenated with correspodning year.
Hi,
Try this..
data want; | |
input date; | |
informat date mmddyy10.; | |
format date date9.; | |
year=year(date); | |
qtr=qtr(date); | |
DateValue=compress('Q'||qtr||'_'||year); | |
cards; | |
01/25/2011 |
12/25/2010
; | |
run; |
Thanks,
Shiva
How about:
proc import datafile='d:\software\SampleFile.xls' out=have dbms=excel replace; getnames=yes;run; data have(drop=_f); set have(rename=(filldat=_f)); filldat=input(_f,mmddyy10.); format filldat mmddyy10.; run; proc sort data=have;by filldat;run; proc sql noprint; select quote(name) into : list separated by ',' from dictionary.columns where libname='WORK' and memname='HAVE'; quit; data _null_; declare hash ha(multidata:'Y'); ha.definekey('filldat'); ha.definedata(&list); ha.definedone(); do until(last.filldat); set have; by filldat groupformat; format filldat yyq.; ha.add(); end; ha.output(dataset: cats('Q',qtr(filldat),'_',year(filldat))); run;
Ksharp
Or this way?
proc sql noprint;
select distinct
cats('want_',put(filldat,yyq6.)) ,
cat(' ',"when('",put(filldat,yyq6.),"') output ",cats('want_',put(filldat,yyq6.),';'))
into :Data_DsList separated by ' ' ,
:Out_DsList separated by ' '
from work.SampleFile
;
quit;
data &Data_DsList;
set work.SampleFile;
select(put(filldat,yyq6.));
&Out_DsList
otherwise;
end;
run;
Patrick,
I want know, yours and mine ,which code is more efficient .
Ksharp
Mine of course! It uses less characters :smileylaugh:
...and more serious: I believe the Proc Sort to be quite costly.
No. I do not think so.
You used PUT() for every observations ,which is very low efficient.
I bet mine is faster than yours.
Hope OP can give us some feedback.
Ksharp
You're writing the data 3 times to disk - even worse :smileyshocked:
O.K.... you win. Unless I feed less rows to the SQL with the put statements as done below.
But then: You could replace your Proc Sort with a SQL sort creating a view and I assume we would be more or less even.
Good to be reminded: I'm going to use functions next time.
16 data work.SampleFile;
17 set work.SampleFile;
18 do _i=1 to 20;
19 output;
20 end;
21 run;
NOTE: There were 56720 observations read from the data set WORK.SAMPLEFILE.
NOTE: The data set WORK.SAMPLEFILE has 1134400 observations and 41 variables.
NOTE: DATA statement used (Total process time):
real time 0.67 seconds
user cpu time 0.23 seconds
system cpu time 0.42 seconds
memory 30629.37k
OS Memory 37732.00k
Timestamp 27/04/2012 08:19:20 PM
22
23 data have;
24 set work.SampleFile;
25 format filldat mmddyy10.;
26 run;
NOTE: There were 1134400 observations read from the data set WORK.SAMPLEFILE.
NOTE: The data set WORK.HAVE has 1134400 observations and 41 variables.
NOTE: DATA statement used (Total process time):
real time 1.07 seconds
user cpu time 0.26 seconds
system cpu time 0.78 seconds
memory 30629.37k
OS Memory 37732.00k
Timestamp 27/04/2012 08:19:21 PM
2 The SAS System 20:18 Friday, April 27, 2012
27
28 options fullstimer;
29 /* Patrick */
30 proc sql noprint;
31 create view vSampleFile as select distinct filldat from SampleFile;
NOTE: SQL view WORK.VSAMPLEFILE has been defined.
32 select distinct
33 cats('want_',put(filldat,yyq6.)) ,
34 cat(' ',"when('",put(filldat,yyq6.),"') output ",cats('want_',put(filldat,yyq6.),';
34 ! '))
35 into :Data_DsList separated by ' ' ,
36 :Out_DsList separated by ' '
37 from work.vSampleFile
38 ;
39 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.96 seconds
user cpu time 0.56 seconds
system cpu time 0.15 seconds
memory 46209.40k
OS Memory 53564.00k
Timestamp 27/04/2012 08:19:22 PM
40
41 data &Data_DsList;
42 set work.SampleFile;
43 select(put(filldat,yyq6.));
44 &Out_DsList
45 otherwise;
46 end;
47 run;
NOTE: There were 1134400 observations read from the data set WORK.SAMPLEFILE.
NOTE: The data set WORK.WANT_2011Q2 has 255600 observations and 41 variables.
NOTE: The data set WORK.WANT_2011Q3 has 283200 observations and 41 variables.
NOTE: The data set WORK.WANT_2011Q4 has 300000 observations and 41 variables.
NOTE: The data set WORK.WANT_2012Q1 has 295600 observations and 41 variables.
NOTE: DATA statement used (Total process time):
real time 5.78 seconds
user cpu time 0.53 seconds
system cpu time 0.60 seconds
memory 46209.40k
OS Memory 53564.00k
Timestamp 27/04/2012 08:19:28 PM
48
49
50 /* Ksharp */
51 proc sort data=have;by filldat;run;
3 The SAS System 20:18 Friday, April 27, 2012
NOTE: There were 1134400 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.HAVE has 1134400 observations and 41 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 9.96 seconds
user cpu time 1.37 seconds
system cpu time 1.70 seconds
memory 268894.25k
OS Memory 276588.00k
Timestamp 27/04/2012 08:19:38 PM
52 proc sql noprint;
53 select quote(name) into : list separated by ','
54 from dictionary.columns
55 where libname='WORK' and memname='HAVE';
56 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.05 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 268894.25k
OS Memory 276588.00k
Timestamp 27/04/2012 08:19:39 PM
57 data _null_;
58 declare hash ha(multidata:'Y');
59 ha.definekey('filldat');
60 ha.definedata(&list);
61 ha.definedone();
62 do until(last.filldat);
63 set have;
64 by filldat groupformat;
65 format filldat yyq.;
66 ha.add();
67 end;
68 ha.output(dataset: cats('Q',qtr(filldat),'_',year(filldat)));
69 run;
NOTE: The data set WORK.Q2_2011 has 255600 observations and 41 variables.
NOTE: The data set WORK.Q3_2011 has 283200 observations and 41 variables.
NOTE: The data set WORK.Q4_2011 has 300000 observations and 41 variables.
NOTE: The data set WORK.Q1_2012 has 295600 observations and 41 variables.
NOTE: There were 1134400 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 6.34 seconds
user cpu time 1.65 seconds
system cpu time 0.79 seconds
memory 549040.51k
OS Memory 557128.00k
Timestamp 27/04/2012 08:19:46 PM
4 The SAS System 20:18 Friday, April 27, 2012
70
71
72 GOPTIONS NOACCESSIBLE;
73 %LET _CLIENTTASKLABEL=;
74 %LET _CLIENTPROJECTPATH=;
75 %LET _CLIENTPROJECTNAME=;
76 %LET _SASPROGRAMFILE=;
77
78 ;*';*";*/;quit;run;
79 ODS _ALL_ CLOSE;
80
81
82 QUIT; RUN;
83
Hi.Patrick,
I guess OP's dataset might have already been sorted before.
My ' proc sort ' maybe doesn't need any more.
Just Guess.
Ksharp
if you want high effficiency, collect the data in memory - either array or hash table indexed by put( filldat, qtr_format. )
and write the array at end-of-file.
For an array you would need to know the range (min/max dates) but a hash table solution would need only the single pass. (For an array solution, you could presume "outer" bounds like 1970-q1 to 2100-q1 without taking a lot of memory, but of course, the lower bound might need to be much earlier for data like date-of-birth).
Another high-efficiency alternate solution writes to a separate table for each distinct quarter, in one data step (needing no sort as there will only be a relatively small number of quarters).
Thanks Peter.
This little "contest" between Ksharp and I wasn't really serious. My code was never designed for performance and I don't believe that for the OP's real problem performance is importants. I liked the discussion with Ksharp - and even when not really serious there is always something to learn and take out of it.
Patrick, Ksharp, your skilled discussion is very stimulating. There is a lot to be learned in trying to sort out such performance issues. It set me doing my own "testing" along the same approach as Patrick's. I compared a simple SQL solution inspired by your comments with the Hash-based program proposed by Ksharp. The SQL queries go as follows:
proc sql noprint;
create table qtr as select fillDate,
cats('Q',qtr(fillDate),'_',year(fillDate)) as qtr length=7
from (select distinct fillDate from have)
order by fillDate;
select distinct cat("create table ", qtr,
" as select * from have natural join qtr where qtr=", quote(qtr))
into :sqlCmd separated by ';'
from qtr;
&sqlCmd.;
drop table qtr;
quit;
My tests seem to indicate that when the data is already sorted by date, the two approaches take about the same time. But when the order is random, the hash-based approach takes about twice as long. In both cases (sorted or not), hash takes more than 10 times the memory that SQL does. The log is attached.
PG
I wouldn't have expected a pure SQL solution to perform that good. Interesting.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.