Hi,
Can you please help me to split the file in to quarters according to filldat (column F in the attached).
For instance if the date is 01/25/2011 it sould be treated as Q1_2011 and if the date is 12/25/2010 it should be Q4_2010.
I have attached similar code for split to months, but not sure on how to do for Quarters.
Thanks,
+Sathyan
How about:
proc import datafile='d:\software\SampleFile.xls' out=have dbms=excel replace; getnames=yes;run; data have(drop=_f); set have(rename=(filldat=_f)); filldat=input(_f,mmddyy10.); format filldat mmddyy10.; run; proc sort data=have;by filldat;run; proc sql noprint; select quote(name) into : list separated by ',' from dictionary.columns where libname='WORK' and memname='HAVE'; quit; data _null_; declare hash ha(multidata:'Y'); ha.definekey('filldat'); ha.definedata(&list); ha.definedone(); do until(last.filldat); set have; by filldat groupformat; format filldat yyq.; ha.add(); end; ha.output(dataset: cats('Q',qtr(filldat),'_',year(filldat))); run;
Ksharp
There is function tonget the quarter of year and then it can be concatenated with correspodning year.
Hi,
Try this..
data want; | |
input date; | |
informat date mmddyy10.; | |
format date date9.; | |
year=year(date); | |
qtr=qtr(date); | |
DateValue=compress('Q'||qtr||'_'||year); | |
cards; | |
01/25/2011 |
12/25/2010
; | |
run; |
Thanks,
Shiva
How about:
proc import datafile='d:\software\SampleFile.xls' out=have dbms=excel replace; getnames=yes;run; data have(drop=_f); set have(rename=(filldat=_f)); filldat=input(_f,mmddyy10.); format filldat mmddyy10.; run; proc sort data=have;by filldat;run; proc sql noprint; select quote(name) into : list separated by ',' from dictionary.columns where libname='WORK' and memname='HAVE'; quit; data _null_; declare hash ha(multidata:'Y'); ha.definekey('filldat'); ha.definedata(&list); ha.definedone(); do until(last.filldat); set have; by filldat groupformat; format filldat yyq.; ha.add(); end; ha.output(dataset: cats('Q',qtr(filldat),'_',year(filldat))); run;
Ksharp
Or this way?
proc sql noprint;
select distinct
cats('want_',put(filldat,yyq6.)) ,
cat(' ',"when('",put(filldat,yyq6.),"') output ",cats('want_',put(filldat,yyq6.),';'))
into :Data_DsList separated by ' ' ,
:Out_DsList separated by ' '
from work.SampleFile
;
quit;
data &Data_DsList;
set work.SampleFile;
select(put(filldat,yyq6.));
&Out_DsList
otherwise;
end;
run;
Patrick,
I want know, yours and mine ,which code is more efficient .
Ksharp
Mine of course! It uses less characters :smileylaugh:
...and more serious: I believe the Proc Sort to be quite costly.
No. I do not think so.
You used PUT() for every observations ,which is very low efficient.
I bet mine is faster than yours.
Hope OP can give us some feedback.
Ksharp
You're writing the data 3 times to disk - even worse :smileyshocked:
O.K.... you win. Unless I feed less rows to the SQL with the put statements as done below.
But then: You could replace your Proc Sort with a SQL sort creating a view and I assume we would be more or less even.
Good to be reminded: I'm going to use functions next time.
16 data work.SampleFile;
17 set work.SampleFile;
18 do _i=1 to 20;
19 output;
20 end;
21 run;
NOTE: There were 56720 observations read from the data set WORK.SAMPLEFILE.
NOTE: The data set WORK.SAMPLEFILE has 1134400 observations and 41 variables.
NOTE: DATA statement used (Total process time):
real time 0.67 seconds
user cpu time 0.23 seconds
system cpu time 0.42 seconds
memory 30629.37k
OS Memory 37732.00k
Timestamp 27/04/2012 08:19:20 PM
22
23 data have;
24 set work.SampleFile;
25 format filldat mmddyy10.;
26 run;
NOTE: There were 1134400 observations read from the data set WORK.SAMPLEFILE.
NOTE: The data set WORK.HAVE has 1134400 observations and 41 variables.
NOTE: DATA statement used (Total process time):
real time 1.07 seconds
user cpu time 0.26 seconds
system cpu time 0.78 seconds
memory 30629.37k
OS Memory 37732.00k
Timestamp 27/04/2012 08:19:21 PM
2 The SAS System 20:18 Friday, April 27, 2012
27
28 options fullstimer;
29 /* Patrick */
30 proc sql noprint;
31 create view vSampleFile as select distinct filldat from SampleFile;
NOTE: SQL view WORK.VSAMPLEFILE has been defined.
32 select distinct
33 cats('want_',put(filldat,yyq6.)) ,
34 cat(' ',"when('",put(filldat,yyq6.),"') output ",cats('want_',put(filldat,yyq6.),';
34 ! '))
35 into :Data_DsList separated by ' ' ,
36 :Out_DsList separated by ' '
37 from work.vSampleFile
38 ;
39 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.96 seconds
user cpu time 0.56 seconds
system cpu time 0.15 seconds
memory 46209.40k
OS Memory 53564.00k
Timestamp 27/04/2012 08:19:22 PM
40
41 data &Data_DsList;
42 set work.SampleFile;
43 select(put(filldat,yyq6.));
44 &Out_DsList
45 otherwise;
46 end;
47 run;
NOTE: There were 1134400 observations read from the data set WORK.SAMPLEFILE.
NOTE: The data set WORK.WANT_2011Q2 has 255600 observations and 41 variables.
NOTE: The data set WORK.WANT_2011Q3 has 283200 observations and 41 variables.
NOTE: The data set WORK.WANT_2011Q4 has 300000 observations and 41 variables.
NOTE: The data set WORK.WANT_2012Q1 has 295600 observations and 41 variables.
NOTE: DATA statement used (Total process time):
real time 5.78 seconds
user cpu time 0.53 seconds
system cpu time 0.60 seconds
memory 46209.40k
OS Memory 53564.00k
Timestamp 27/04/2012 08:19:28 PM
48
49
50 /* Ksharp */
51 proc sort data=have;by filldat;run;
3 The SAS System 20:18 Friday, April 27, 2012
NOTE: There were 1134400 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.HAVE has 1134400 observations and 41 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 9.96 seconds
user cpu time 1.37 seconds
system cpu time 1.70 seconds
memory 268894.25k
OS Memory 276588.00k
Timestamp 27/04/2012 08:19:38 PM
52 proc sql noprint;
53 select quote(name) into : list separated by ','
54 from dictionary.columns
55 where libname='WORK' and memname='HAVE';
56 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.05 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 268894.25k
OS Memory 276588.00k
Timestamp 27/04/2012 08:19:39 PM
57 data _null_;
58 declare hash ha(multidata:'Y');
59 ha.definekey('filldat');
60 ha.definedata(&list);
61 ha.definedone();
62 do until(last.filldat);
63 set have;
64 by filldat groupformat;
65 format filldat yyq.;
66 ha.add();
67 end;
68 ha.output(dataset: cats('Q',qtr(filldat),'_',year(filldat)));
69 run;
NOTE: The data set WORK.Q2_2011 has 255600 observations and 41 variables.
NOTE: The data set WORK.Q3_2011 has 283200 observations and 41 variables.
NOTE: The data set WORK.Q4_2011 has 300000 observations and 41 variables.
NOTE: The data set WORK.Q1_2012 has 295600 observations and 41 variables.
NOTE: There were 1134400 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 6.34 seconds
user cpu time 1.65 seconds
system cpu time 0.79 seconds
memory 549040.51k
OS Memory 557128.00k
Timestamp 27/04/2012 08:19:46 PM
4 The SAS System 20:18 Friday, April 27, 2012
70
71
72 GOPTIONS NOACCESSIBLE;
73 %LET _CLIENTTASKLABEL=;
74 %LET _CLIENTPROJECTPATH=;
75 %LET _CLIENTPROJECTNAME=;
76 %LET _SASPROGRAMFILE=;
77
78 ;*';*";*/;quit;run;
79 ODS _ALL_ CLOSE;
80
81
82 QUIT; RUN;
83
Hi.Patrick,
I guess OP's dataset might have already been sorted before.
My ' proc sort ' maybe doesn't need any more.
Just Guess.
Ksharp
if you want high effficiency, collect the data in memory - either array or hash table indexed by put( filldat, qtr_format. )
and write the array at end-of-file.
For an array you would need to know the range (min/max dates) but a hash table solution would need only the single pass. (For an array solution, you could presume "outer" bounds like 1970-q1 to 2100-q1 without taking a lot of memory, but of course, the lower bound might need to be much earlier for data like date-of-birth).
Another high-efficiency alternate solution writes to a separate table for each distinct quarter, in one data step (needing no sort as there will only be a relatively small number of quarters).
Thanks Peter.
This little "contest" between Ksharp and I wasn't really serious. My code was never designed for performance and I don't believe that for the OP's real problem performance is importants. I liked the discussion with Ksharp - and even when not really serious there is always something to learn and take out of it.
Patrick, Ksharp, your skilled discussion is very stimulating. There is a lot to be learned in trying to sort out such performance issues. It set me doing my own "testing" along the same approach as Patrick's. I compared a simple SQL solution inspired by your comments with the Hash-based program proposed by Ksharp. The SQL queries go as follows:
proc sql noprint;
create table qtr as select fillDate,
cats('Q',qtr(fillDate),'_',year(fillDate)) as qtr length=7
from (select distinct fillDate from have)
order by fillDate;
select distinct cat("create table ", qtr,
" as select * from have natural join qtr where qtr=", quote(qtr))
into :sqlCmd separated by ';'
from qtr;
&sqlCmd.;
drop table qtr;
quit;
My tests seem to indicate that when the data is already sorted by date, the two approaches take about the same time. But when the order is random, the hash-based approach takes about twice as long. In both cases (sorted or not), hash takes more than 10 times the memory that SQL does. The log is attached.
PG
I wouldn't have expected a pure SQL solution to perform that good. Interesting.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.