Hi,
Could you please help me to split the data in to multiple data sets based on the Year and month.
For e.g My input data contains two year data, I want to split it in to 24 small data sets based on the month and year.
I used the below query to get month and year.
data
set claim_file
informat x_date date8.
format x_date date8.
M = month(x_date)
Y = year(x_date)
run
Out put should have a data set name like Jan_2011, Feb_2011 etc..
Please help.
Thanks.
Here is a version works on your real data based on Art's code, you may need to change libref location though.
libname myxl 'h:\data.xls';
data temp;
set myxl.'sheet1$'n;
format dateoffill date9.;
dateoffill=input(put(dateoffill,8.),yymmdd8.);
month=catx('_',put(dateoffill,monname3.),year(dateoffill));
run;
proc sort data=temp;
by month;
run;
proc sql noprint;
select quote(name)
into :vars separated by ','
from dictionary.columns
where libname="WORK" and
memname="TEMP" and
name ne "month"
;
quit;
data _null_ ;
dcl hash hh () ;
hh.definekey ('k') ;
hh.definedata (&vars.) ;
hh.definedone () ;
do k = 1 by 1 until ( last.month ) ;
set temp ;
by month;
hh.add () ;
end ;
hh.output (dataset: month) ;
run ;
Regards,
Haikuo
See http://www.sascommunity.org/wiki/Split_Data_into_Subsets
Be sure to notice the advisory against doing this.
skallamp wrote:
Hi,
Could you please help me to split the data in to multiple data sets based on the Year and month.
For e.g My input data contains two year data, I want to split it in to 24 small data sets based on the month and year.
I used the below query to get month and year.
data
set claim_file
informat x_date date8.
format x_date date8.
M = month(x_date)
Y = year(x_date)
run
Out put should have a data set name like Jan_2011, Feb_2011 etc..
Please help.
Thanks.
data have; input month : yymmn. no; format month yymmn.; cards; 201001 1 201001 2 201002 1 201002 3 201003 4 201003 5 201004 4 201004 5 201004 6 201005 1 201005 5 201006 1 201006 5 201006 7 201006 8 201006 9 201007 5 201007 7 201008 9 201009 4 201010 10 201010 11 201011 1 201011 12 201012 7 201101 1 201102 11 201103 12 ; run; proc sort data=have;by month;run; data _null_ ; dcl hash hh () ; hh.definekey ('k') ; hh.definedata ('month','no') ; hh.definedone () ; do k = 1 by 1 until ( last.month ) ; set have ; by month groupformat ; format month monyy.; hh.add () ; end ; hh.output (dataset: put(month,monyy5.)) ; run ;
Ksharp
Thanks for the thoughts. I am getting the below error while applying it to my data.
15 data _null_ ;
16 dcl hash hh () ;
17 hh.definekey ('k') ;
18 hh.definedata ('RX_DATE','no') ;
19 hh.definedone () ;
20 do k = 1 by 1 until ( last.RX_DATE ) ;
21 set date ;
22 by rx_date groupformat ;
23 format rx_date monyy.;
24 hh.add () ;
25 end ;
26 hh.output(dataset: put(rx_date,monyy5.)) ;
27 run ;
ERROR: Undeclared data symbol no for hash object at line 19 column 3.
ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.
If I have more variables, how do I put everything with out typing it one by one.
Here hh.definedata('rx_date') is giving me only one variable. I need all of them which is present in my data set.
add all your variables:
hh.definedata('rx_date','your_var1','your_var2')
My Variables are keep changing from one work to another.
Is there a way I can add all variables using a simple statement? somethin like *..
I just learned something new from Hai.kuo but, since I created a new field in my suggested code, used proc sql to build a list of all the variables EXCEPT the one I created. While a I agree with Howles that you probably DON'T want to do this, would something like the following suffice?:
data date;
informat rx_date mmddyy8.;
input rx_date no value;
format rx_date date8.;
cards;
01012010 1 1
01012010 2 2
02012010 1 3
02012010 3 4
03012010 4 3
03012010 5 2
04012010 4 1
04012010 5 0
04012010 6 7
05012010 1 6
05012010 5 5
06012010 1 4
06012010 5 3
06012010 7 2
06012010 8 1
06012010 9 2
07012010 5 3
07012010 7 4
08012010 9 5
09012010 4 6
10012010 10 7
10012010 11 8
11012010 1 9
11012010 12 10
12012010 7 33
01012011 1 32
02012011 11 47
03012011 12 24
;
data temp;
set date;
month=catx('_',put(rx_date,monname3.),year(rx_date));
run;
proc sort data=temp;
by month;
run;
proc sql noprint;
select quote(name)
into :vars separated by ','
from dictionary.columns
where libname="WORK" and
memname="TEMP" and
name ne "month"
;
quit;
data _null_ ;
dcl hash hh () ;
hh.definekey ('k') ;
hh.definedata (&vars.) ;
hh.definedone () ;
do k = 1 by 1 until ( last.month ) ;
set temp ;
by month;
hh.add () ;
end ;
hh.output (dataset: month) ;
run ;
Hi Art,
Its giving me the below error.
36 dcl hash hh () ;
37 hh.definekey ('k') ;
38 hh.definedata (&vars.) ;
_
386
200
76
WARNING: Apparent symbolic reference VARS not resolved.
ERROR 386-185: Expecting an arithmetic expression.
ERROR 200-322: The symbol is not recognized and will be ignored.
ERROR 76-322: Syntax error, statement will be ignored.
39 hh.definedone () ;
ERROR: DATA STEP Component Object failure. Aborted during the COMPILATION phase.
NOTE: The SAS System stopped processing this step because of errors.
Like Linlin mentioned, there was an error in my sql statement. However, it should not have been "date" but, rather, "temp". i.e.,
proc sql noprint;
select quote(name)
into :vars separated by ','
from dictionary.columns
where libname="WORK" and
memname="TEMP" and
name ne "month"
;
quit;
Since you ran my original code suggesting using "need", you should have gotten an error when you ran the proc sql code.
Art
Art,
Why 'temp' not 'date'? Thank you!
Hi This is what I am using...still i m getting error
data temp;
set claim_detail;
month=catx('_',put(rx_date,monname3.),year(rx_date));
run;
proc sort data=temp;
by month;
run;
proc sql noprint;
select quote(name)
into :vars separated by ','
from dictionary.columns
where libname="WORK" and
memname="NEED" and
name ne "month"
;
quit;
data _null_ ;
dcl hash hh () ;
hh.definekey ('k') ;
hh.definedata (&vars.) ;
hh.definedone () ;
do k = 1 by 1 until ( last.month ) ;
set temp ;
by month;
hh.add () ;
end ;
hh.output (dataset: month) ;
run ;
change
memname="NEED"
to
memname="TEMP"
My error is b'cse of this statement.
Temp output has blank value for month...my rx_date format is yyyymmdd format
month=catx(
),year(rx_date));
Hi Art,
I think ' memname="NEED" and' in your code should be ' memname="TEMP" and'
Thank you! - Linlin
Updated after Art's comments
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.