Solved: Re: Splitting a database

Demographer · Posted 12-14-2012 08:30 PM

Hello,

I have a database who looks like this:

Years	pf1	pf2	pf3	pf4	pf5	pf6	pf7	pf8	…	pf119
2006	0,99995	0,99984	0,99969	0,99954	0,99939	0,99924	0,99909	0,99894		0,99864
2007	0,90419475	0,86805513	0,95407293	0,8676255	0,9106782	0,89055109	0,99534984	0,82628474		0,85245713
2008	0,9547502	0,86854718	0,80312147	0,94014155	0,92665103	0,97910746	0,95293207	0,94262974		0,86068283
2009	0,84331363	0,81522629	0,98906643	0,88725626	0,94812541	0,85975439	0,93789548	0,98424024		0,82137827
2010	0,83663445	0,84507837	0,91888898	0,95447863	0,87323238	0,98027643	0,93558909	0,85553856		0,97474632
2011	0,86543087	0,9058221	0,80951478	0,97289518	0,80704643	0,94382676	0,89678733	0,93423198		0,89965292
2012	0,91703081	0,88559104	0,8006615	0,89746979	0,96224727	0,93283978	0,99704944	0,95622047		0,92032314
…
2056	0,91923551	0,98895088	0,94733737	0,88924355	0,84496988	0,88136723	0,86645268	0,88707045		0,9918978

I would like to create a single database for each line (without the first column, which is the name of the file), i.e.:

2006.sas7bdat:

pf1	pf2	pf3	pf4	pf5	pf6	pf7	pf8	…	pf119
0,99995	0,99984	0,99969	0,99954	0,99939	0,99924	0,99909	0,99894		0,99864

2007.sas7bdat:

pf1	pf2	pf3	pf4	pf5	pf6	pf7	pf8	…	pf119
0,90419475	0,86805513	0,95407293	0,8676255	0,9106782	0,89055109	0,99534984	0,82628474		0,85245713

etc.

Is there a way to do this with a macro or something. Otherwise, it's not very efficient to do it manually.

Thanks

bnarang · Posted 12-15-2012 02:37 AM

I would have done it this way.

%Macro temp;

Data _null_;

Set Temp end = eof;

count+1;

if eof then call symput('c_yr', compress(count));

Run;

Proc sql;

select years into : yr1 - :yr&c_yr from temp;

quit;

%do i= 1 %to &c_yr;

Data Y&&yr&i.;

Set temp ;

where years eq &&yr&i.;

Run;

%end;

%Mend;

%temp;

View solution in original post

Linlin · Posted 12-14-2012 09:40 PM

data have;

input year p1;

cards;

2005 234.1

2006 456.4

2007 786.1

;

proc sql noprint;

select year into:key separated by ' '

from have;

quit;

%macro dist;

%do i=1 %to %sysfunc(countw(&key));

%let dsn=%scan(&key,&i);

data _&dsn;

set have;

where year=&dsn;

drop year;

run;

%end;

%mend;

%dist

bnarang · Posted 12-15-2012 02:37 AM

I would have done it this way.

%Macro temp;

Data _null_;

Set Temp end = eof;

count+1;

if eof then call symput('c_yr', compress(count));

Run;

Proc sql;

select years into : yr1 - :yr&c_yr from temp;

quit;

%do i= 1 %to &c_yr;

Data Y&&yr&i.;

Set temp ;

where years eq &&yr&i.;

Run;

%end;

%Mend;

%temp;

Demographer · Posted 12-15-2012 04:31 PM

Thanks all. This forum is wonderful. Hopefully I would be as good programer as you are in the next years.

data_null__ · Posted 12-15-2012 09:08 PM

I think if you move the WHERE statements to the DATA statement as a data set options for each data set name you generate you could create all the data sets with one pass of the data.

bnarang · Posted 12-16-2012 02:08 AM

Yes, Data_null_ , you are right. that would make it fast.

Linlin · Posted 12-19-2012 10:05 PM

Hi Bhpinder,

You can combine your two steps:

Data _null_;

Set Temp end = eof;

count+1;

if eof then call symput('c_yr', compress(count));

Run;

Proc sql;

select years into : yr1 - :yr&c_yr from temp;

quit;

into one:

Data _null_;

Set Temp end = eof;

call symputx(cats('yr',_n_),years);

if eof then call symputx('c_yr',_n_);

Run;

bnarang · Posted 12-19-2012 10:40 PM

Thank you so much Linlin. I did not know about call symputx. It's such a great place to learn.

joehinson · Posted 12-15-2012 02:49 PM

The macro solutions are great for such problems.

Incidentally, the hash-of-hashes (HOH) technique was first demonstrated by Richard DeVenezia for dataset splitting.

I have used a similar HOH approach, using dataset functions to read in data:

data have;

infile datalines dsd dlm=" ";

input Years $ (pf1-pf8) ($);

datalines;

2006 0,99995 0,99984 0,99969 0,99954 0,99939 0,99924 0,99909 0,99894 0,99864

2007 0,90419475 0,86805513 0,95407293 0,8676255 0,9106782 0,89055109 0,99534984 0,82628474 0,85245713

2008 0,9547502 0,86854718 0,80312147 0,94014155 0,92665103 0,97910746 0,95293207 0,94262974 0,86068283

2009 0,84331363 0,81522629 0,98906643 0,88725626 0,94812541 0,85975439 0,93789548 0,98424024 0,82137827

2010 0,83663445 0,84507837 0,91888898 0,95447863 0,87323238 0,98027643 0,93558909 0,85553856 0,97474632

2011 0,86543087 0,9058221 0,80951478 0,97289518 0,80704643 0,94382676 0,89678733 0,93423198 0,89965292

2012 0,91703081 0,88559104 0,8006615 0,89746979 0,96224727 0,93283978 0,99704944 0,95622047 0,92032314

;

run;

data _null_;

length pf $15;

if(1=2) then set have;

declare hash p (ordered:"a");

p.defineKey("years");

p.defineData("years","ObjPF");

p.defineDone();

declare hiter hip("p");

declare hash ObjPF;

*****Bring in data using data access functions**********************;

dsid=open("have");

nrows=attrn(dsid,"NOBS");

ncols=attrn(dsid,"NVAR");

do x=1 to nrows;

rc=fetchobs(dsid,x);

years=getvarc(dsid,1);

nametext="YR"||strip(left(years));

rc=fetchobs(dsid,x);

do y=2 to ncols;

pf=getvarc(dsid,y);

***** Create Hash-of-Hashes*********************************;

if p.find() ne 0 then do;

ObjPF=_new_ hash(ordered:"a",multidata:"y");

ObjPF.defineKey("y");

ObjPF.defineData("pf");

ObjPF.defineDone();

p.replace();

end; *for p.find;

ObjPF.replace();

end; *for y;

end; *for x;

****hash-of-hashes readout and dataset creation*********************;

do rc = hip.first() by 0 while (rc = 0);

ObjPF.output (dataset:'YR_'||years);

rc = hip.next();

end; *for hip.first;

stop;

run;

art297 · Posted 12-15-2012 03:09 PM

This might be an easier hash solution:

proc sql noprint;

select cats("'",name,"'")

into :vars

separated by ","

from dictionary.columns

where libname="WORK" and

memname="HAVE" and

upcase(name) ne "YEARS"

;

quit;

data _null_ ;

dcl hash hid (ordered: 'a') ;

hid.definekey ('years') ;

hid.definedata (&vars. ) ;

hid.definedone ( ) ;

do until ( last.years ) ;

set have ;

by years ;

hid.add() ;

end ;

hid.output (dataset: "_"||put (years, best.-l)) ;

run;

PGStats · Posted 12-15-2012 09:27 PM

Another way, inspired by Linlin's :

data have;
input year p1;
cards;
2005 234.1
2006 456.4
2007 786.1
;

proc sql noprint;
select
     cats("_",year),
     cats("when(",year,") output _",year)
into
     :dts separated by ' ',
     :when separated by ';'
from have;
quit;

data &dts.;
set have;
select (year);
&when.;
end;
drop year;
run;

PG

art297 · Posted 12-16-2012 10:08 AM

: for the OP's needs, I think that yours is the easiest to understand solution. However, I'm going to continue recommending using a hash for such tasks, so that one doesn't have to be concerned with macro variable or line length limitations.

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away