DATA Step, Macro, Functions and more

Splitting a database

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 111
Accepted Solution

Splitting a database

Hello,

I have a database who looks like this:

Yearspf1pf2pf3pf4pf5pf6pf7pf8pf119
20060,999950,999840,999690,999540,999390,999240,999090,998940,99864
20070,904194750,868055130,954072930,86762550,91067820,890551090,995349840,826284740,85245713
20080,95475020,868547180,803121470,940141550,926651030,979107460,952932070,942629740,86068283
20090,843313630,815226290,989066430,887256260,948125410,859754390,937895480,984240240,82137827
20100,836634450,845078370,918888980,954478630,873232380,980276430,935589090,855538560,97474632
20110,865430870,90582210,809514780,972895180,807046430,943826760,896787330,934231980,89965292
20120,917030810,885591040,80066150,897469790,962247270,932839780,997049440,956220470,92032314
20560,919235510,988950880,947337370,889243550,844969880,881367230,866452680,887070450,9918978

I would like to create a single database for each line (without the first column, which is the name of the file), i.e.:

2006.sas7bdat:

pf1pf2pf3pf4pf5pf6pf7pf8pf119
0,999950,999840,999690,999540,999390,999240,999090,998940,99864

2007.sas7bdat:

pf1pf2pf3pf4pf5pf6pf7pf8pf119
0,904194750,868055130,954072930,86762550,91067820,890551090,995349840,826284740,85245713

etc.

Is there a way to do this with a macro or something. Otherwise, it's not very efficient to do it manually.

Thanks


Accepted Solutions
Solution
‎12-15-2012 02:37 AM
Frequent Contributor
Posts: 86

Re: Splitting a database

Posted in reply to Demographer

I would have done it this way.

%Macro temp;

Data _null_;

  Set Temp end = eof;

  count+1;

  if eof then call symput('c_yr', compress(count));

Run;

Proc sql;

select years into : yr1 - :yr&c_yr   from temp;

quit;

  %do i= 1 %to &c_yr;

      Data Y&&yr&i.;

      Set temp ;

      where years eq &&yr&i.;

     Run;

   %end;

%Mend;

%temp;

View solution in original post


All Replies
Super Contributor
Posts: 1,636

Re: Splitting a database

Posted in reply to Demographer

data have;

input year p1;

cards;

2005 234.1

2006 456.4

2007 786.1

;

proc sql noprint;

select year into:key separated by ' '

from have;

quit;

%macro dist;

%do i=1 %to %sysfunc(countw(&key));

%let dsn=%scan(&key,&i);

data _&dsn;

set have;

where year=&dsn;

drop year;

run;

%end;

%mend;

%dist

Solution
‎12-15-2012 02:37 AM
Frequent Contributor
Posts: 86

Re: Splitting a database

Posted in reply to Demographer

I would have done it this way.

%Macro temp;

Data _null_;

  Set Temp end = eof;

  count+1;

  if eof then call symput('c_yr', compress(count));

Run;

Proc sql;

select years into : yr1 - :yr&c_yr   from temp;

quit;

  %do i= 1 %to &c_yr;

      Data Y&&yr&i.;

      Set temp ;

      where years eq &&yr&i.;

     Run;

   %end;

%Mend;

%temp;

Frequent Contributor
Posts: 111

Re: Splitting a database

Thanks all. This forum is wonderful. Hopefully I would be as good programer as you are in the next years.

Respected Advisor
Posts: 3,799

Re: Splitting a database

I think if you move the WHERE statements to the DATA statement as a data set options for each data set name you generate you could create all the data sets with one pass of the data.

Frequent Contributor
Posts: 86

Re: Splitting a database

Posted in reply to data_null__

Yes, Data_null_ , you are right. that would make it fast.

Super Contributor
Posts: 1,636

Re: Splitting a database

Hi Bhpinder,

You can combine your two steps:

Data _null_;

  Set Temp end = eof;

  count+1;

  if eof then call symput('c_yr', compress(count));

Run;

Proc sql;

select years into : yr1 - :yr&c_yr   from temp;

quit;

into one:

Data _null_;

  Set Temp end = eof;

  call symputx(cats('yr',_n_),years);

  if eof then call symputx('c_yr',_n_);

Run;

Frequent Contributor
Posts: 86

Re: Splitting a database

Thank you so much Linlin. I did not know about call symputx. It's  such a great place to learn.

Contributor
Posts: 45

Re: Splitting a database

Posted in reply to Demographer

The macro solutions are great for such problems.

Incidentally, the hash-of-hashes (HOH) technique was first demonstrated by Richard DeVenezia for dataset splitting.

I have used a similar HOH approach, using dataset functions to read in data:

data have;

infile datalines dsd dlm=" ";

input Years $ (pf1-pf8) ($);

datalines;

2006 0,99995 0,99984 0,99969 0,99954 0,99939 0,99924 0,99909 0,99894  0,99864

2007 0,90419475 0,86805513 0,95407293 0,8676255 0,9106782 0,89055109 0,99534984 0,82628474  0,85245713

2008 0,9547502 0,86854718 0,80312147 0,94014155 0,92665103 0,97910746 0,95293207 0,94262974  0,86068283

2009 0,84331363 0,81522629 0,98906643 0,88725626 0,94812541 0,85975439 0,93789548 0,98424024  0,82137827

2010 0,83663445 0,84507837 0,91888898 0,95447863 0,87323238 0,98027643 0,93558909 0,85553856  0,97474632

2011 0,86543087 0,9058221 0,80951478 0,97289518 0,80704643 0,94382676 0,89678733 0,93423198  0,89965292

2012 0,91703081 0,88559104 0,8006615 0,89746979 0,96224727 0,93283978 0,99704944 0,95622047  0,92032314

;

run;

data _null_;

       length pf $15;

       if(1=2) then set have;

       declare hash p (ordered:"a");

       p.defineKey("years");

       p.defineData("years","ObjPF");

       p.defineDone();

       declare hiter hip("p");

       declare hash ObjPF;

*****Bring in data using data access functions**********************;

       dsid=open("have");

       nrows=attrn(dsid,"NOBS");

       ncols=attrn(dsid,"NVAR");

       do x=1 to nrows;

              rc=fetchobs(dsid,x);

              years=getvarc(dsid,1);

              nametext="YR"||strip(left(years));

              rc=fetchobs(dsid,x);

              do y=2 to ncols;

                     pf=getvarc(dsid,y);

              ***** Create Hash-of-Hashes*********************************;

                     if p.find() ne 0 then do;

                           ObjPF=_new_ hash(ordered:"a",multidata:"y");

                           ObjPF.defineKey("y");

                           ObjPF.defineData("pf");

                            ObjPF.defineDone();

                           p.replace();

                     end; *for p.find;

                     ObjPF.replace();

              end; *for y;

       end; *for x;

****hash-of-hashes readout and dataset creation*********************;

       do rc = hip.first() by 0 while (rc = 0);

       ObjPF.output (dataset:'YR_'||years);

       rc = hip.next();

       end; *for hip.first;

stop;

run;

PROC Star
Posts: 7,467

Re: Splitting a database

Posted in reply to joehinson

This might be an easier hash solution:

proc sql noprint;

  select cats("'",name,"'")

    into :vars

      separated by ","

        from dictionary.columns

          where libname="WORK" and

                memname="HAVE" and

                upcase(name) ne "YEARS"

  ;

quit;

data _null_ ;

  dcl hash hid (ordered: 'a') ;

  hid.definekey ('years') ;

  hid.definedata (&vars. ) ;

  hid.definedone ( ) ;

  do until ( last.years ) ;

    set have ;

    by years ;

    hid.add() ;

  end ;

  hid.output (dataset: "_"||put (years, best.-l)) ;

run;

Respected Advisor
Posts: 4,919

Re: Splitting a database

Posted in reply to Demographer

Another way, inspired by Linlin's :

data have;
input year p1;
cards;
2005 234.1
2006 456.4
2007 786.1
;

proc sql noprint;
select
     cats("_",year),
     cats("when(",year,") output _",year)
into
     :dts separated by ' ',
     :when separated by ';'
from have;
quit;

data &dts.;
set have;
select (year);
     &when.;
end;
drop year;
run;

PG

PG
PROC Star
Posts: 7,467

Re: Splitting a database

: for the OP's needs, I think that yours is the easiest to understand solution.  However, I'm going to continue recommending using a hash for such tasks, so that one doesn't have to be concerned with macro variable or line length limitations.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 470 views
  • 8 likes
  • 7 in conversation