Help using Base SAS procedures

Combine data sets-which is the fastest method?

Reply
Contributor mei
Contributor
Posts: 62

Combine data sets-which is the fastest method?

I have downloaded daily stock prices from year 2002-2011 in piecemeals due to ease of downloading. They are now in 48 datasets in the name of year_month or year_quarters eg 2002_01 (jan of 2002), 2005_07_09 (from month july to sept 2005). I need to combine them in one master datasets- 2002-2011 for data analysis. They are in identical variables and structure.

I understand that i can combine data sets using appending or concatenating. But considering the no of files i have, what is the fastest method?

Or should I firstly concatenate 2 data sets in 1 masterdata sets , then use appending to add all other datasets in this masterdata sets?

can help me in writing the data steps?

Thanks a lot

mei

Super Contributor
Posts: 1,636

Combine data sets-which is the fastest method?

you can try the code below If your library only contains the 48 datasets. libname must be in capital letters.

proc sql noprint;

  select memname into : names separated by ' '

     from dictionary.tables

             where libname='YOUR_LIBRARY'; /* have to be capital letters */

quit;

%put &names;

data want;

   set &names;

run;

Linlin

Contributor mei
Contributor
Posts: 62

Combine data sets-which is the fastest method?

Thanks linlin.

I wish to clarify:

my master dataset is dailyrtn_2002_2011

the datasets that i need to combine are for example: dailyrtn_2002_01,.... dailyrtn_2002_02, dailyrtn_2003_01_03.......dailyrtn_2011_09_12.

1. DO I write the prog as below:

select memname into dailyrtn_2002_2011

from dictionary.tables

where libname='CRSP';

quit;

%put &name' - **i m not sure what name to insert here**

data want;

set &names; **do i put down all data sets of dailyrtn_2002_01 dailyrtn_2002_02 etc**

run;

2. do i need to assign a library for the final output dailyrtn_2002_2011?

thanks

Respected Advisor
Posts: 4,930

Combine data sets-which is the fastest method?

Linlin, you could add a condition such as upcase(memname) like "DAILYRTN_20%" as a precaution.

PG

PG
Contributor mei
Contributor
Posts: 62

Combine data sets-which is the fastest method?

hi PGStats,

can you help me in writing the part on memname?

Linlin wrote

select memname into names separated by ' ';

should i just type this in the program?

thanks

mei

Frequent Contributor
Posts: 139

Combine data sets-which is the fastest method?

proc sql noprint;

  select memname into : names separated by ' '

     from dictionary.tables

             where upcase(memname) like "DAILYRTN_20%"; /* have to be capital letters */

quit;

%put &names;

data want;

   set &names;

run;

Contributor mei
Contributor
Posts: 62

Combine data sets-which is the fastest method?

Posted in reply to manojinpec

Dear manojinpec,

this is my error log after putting exactly your programme above:

76

77   %put &names;

DAILYRTN_2002_01 DAILYRTN_2002_02 DAILYRTN_2002_03 DAILYRTN_2002_04_06 DAILYRTN_2002_07

DAILYRTN_2002_08_09 DAILYRTN_2002_10_12 DAILYRTN_2003_01 DAILYRTN_2003_02 DAILYRTN_2003_03

DAILYRTN_2003_04 DAILYRTN_2003_05 DAILYRTN_2003_06 DAILYRTN_2003_07_09 DAILYRTN_2003_10_12

DAILYRTN_2004_01_03 DAILYRTN_2004_04_06 DAILYRTN_2004_07_09 DAILYRTN_2004_10_12

DAILYRTN_2005_01_03 DAILYRTN_2005_04_06 DAILYRTN_2005_07_09 DAILYRTN_2005_10_12

DAILYRTN_2006_01_03 DAILYRTN_2006_04_06 DAILYRTN_2006_07_09 DAILYRTN_2006_10_12

DAILYRTN_2007_01_03 DAILYRTN_2007_04_06 DAILYRTN_2007_07_09 DAILYRTN_2007_10_12

DAILYRTN_2008_01_03 DAILYRTN_2008_04_06 DAILYRTN_2008_07_09 DAILYRTN_2008_10

DAILYRTN_2008_11_12 DAILYRTN_2009_01_03 DAILYRTN_2009_04_06 DAILYRTN_2009_07_09

DAILYRTN_2009_10_12 DAILYRTN_2010_01_03 DAILYRTN_2010_04_06 DAILYRTN_2010_07_09

DAILYRTN_2010_10_12 DAILYRTN_2011_01_03 DAILYRTN_2011_04_06 DAILYRTN_2011_07_09

DAILYRTN_2011_10_12 DAILYRTN_2002_2011

78

79

80

81

82

83   data want;

84

85      set &names;

ERROR: File WORK.DAILYRTN_2002_01.DATA does not exist.

ERROR: File WORK.DAILYRTN_2002_02.DATA does not exist.

ERROR: File WORK.DAILYRTN_2002_03.DATA does not exist.

ERROR: File WORK.DAILYRTN_2002_04_06.DATA does not exist.

ERROR: File WORK.DAILYRTN_2002_07.DATA does not exist.

ERROR: File WORK.DAILYRTN_2002_08_09.DATA does not exist.

ERROR: File WORK.DAILYRTN_2002_10_12.DATA does not exist.

ERROR: File WORK.DAILYRTN_2003_01.DATA does not exist.

ERROR: File WORK.DAILYRTN_2003_02.DATA does not exist.

ERROR: File WORK.DAILYRTN_2003_03.DATA does not exist.

ERROR: File WORK.DAILYRTN_2003_04.DATA does not exist.

ERROR: File WORK.DAILYRTN_2003_05.DATA does not exist.

ERROR: File WORK.DAILYRTN_2003_06.DATA does not exist.

ERROR: File WORK.DAILYRTN_2003_07_09.DATA does not exist.

ERROR: File WORK.DAILYRTN_2003_10_12.DATA does not exist.

ERROR: File WORK.DAILYRTN_2004_01_03.DATA does not exist.

ERROR: File WORK.DAILYRTN_2004_04_06.DATA does not exist.

ERROR: File WORK.DAILYRTN_2004_07_09.DATA does not exist.

ERROR: File WORK.DAILYRTN_2004_10_12.DATA does not exist.

ERROR: File WORK.DAILYRTN_2005_01_03.DATA does not exist.

ERROR: File WORK.DAILYRTN_2005_04_06.DATA does not exist.

ERROR: File WORK.DAILYRTN_2005_07_09.DATA does not exist.

ERROR: File WORK.DAILYRTN_2005_10_12.DATA does not exist.

ERROR: File WORK.DAILYRTN_2006_01_03.DATA does not exist.

ERROR: File WORK.DAILYRTN_2006_04_06.DATA does not exist.

ERROR: File WORK.DAILYRTN_2006_07_09.DATA does not exist.

ERROR: File WORK.DAILYRTN_2006_10_12.DATA does not exist.

ERROR: File WORK.DAILYRTN_2007_01_03.DATA does not exist.

ERROR: File WORK.DAILYRTN_2007_04_06.DATA does not exist.

ERROR: File WORK.DAILYRTN_2007_07_09.DATA does not exist.

ERROR: File WORK.DAILYRTN_2007_10_12.DATA does not exist.

ERROR: File WORK.DAILYRTN_2008_01_03.DATA does not exist.

ERROR: File WORK.DAILYRTN_2008_04_06.DATA does not exist.

ERROR: File WORK.DAILYRTN_2008_07_09.DATA does not exist.

ERROR: File WORK.DAILYRTN_2008_10.DATA does not exist.

ERROR: File WORK.DAILYRTN_2008_11_12.DATA does not exist.

ERROR: File WORK.DAILYRTN_2009_01_03.DATA does not exist.

ERROR: File WORK.DAILYRTN_2009_04_06.DATA does not exist.

ERROR: File WORK.DAILYRTN_2009_07_09.DATA does not exist.

ERROR: File WORK.DAILYRTN_2009_10_12.DATA does not exist.

ERROR: File WORK.DAILYRTN_2010_01_03.DATA does not exist.

ERROR: File WORK.DAILYRTN_2010_04_06.DATA does not exist.

ERROR: File WORK.DAILYRTN_2010_07_09.DATA does not exist.

ERROR: File WORK.DAILYRTN_2010_10_12.DATA does not exist.

ERROR: File WORK.DAILYRTN_2011_01_03.DATA does not exist.

ERROR: File WORK.DAILYRTN_2011_04_06.DATA does not exist.

ERROR: File WORK.DAILYRTN_2011_07_09.DATA does not exist.

ERROR: File WORK.DAILYRTN_2011_10_12.DATA does not exist.

86

87   run;

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WORK.WANT may be incomplete.  When this step was stopped there were 0

         observations and 0 variables.

WARNING: Data set WORK.WANT was not replaced because this step was stopped.

NOTE: DATA statement used (Total process time):

      real time           0.06 seconds

      cpu time            0.03 seconds

What should i do to rectify?

thanks

Frequent Contributor
Posts: 139

Combine data sets-which is the fastest method?

libname abc "YOURPATH";

proc sql noprint;

  select memname into : names separated by ' '

     from dictionary.tables

             where upcase(memname) like "DAILYRTN_20%" and libname=abc;

quit;

%put &names;

data want;

   set &names;

run;

Contributor mei
Contributor
Posts: 62

Combine data sets-which is the fastest method?

Posted in reply to manojinpec

Dear manojinpec,

based on your program, i noted an error message as bold below:

but the output is still produced.

you have any clue?

thanks

meiy

log:

libname CRSP 'path';

115  proc sql noprint;

116

117    select memname into : names separated by ' '

118

119       from dictionary.tables

120

121               where upcase(memname) like "DAILYRTN_20%" and libname=CRSP;

ERROR: The following columns were not found in the contributing tables: CRSP.


122

123  quit;

NOTE: The SAS System stopped processing this step because of errors.

NOTE: PROCEDURE SQL used (Total process time):

      real time           0.05 seconds

      cpu time            0.01 seconds

124

125  %put &names;

CRSP.DAILYRTN_2002_01 CRSP.DAILYRTN_2002_02 CRSP.DAILYRTN_2002_03 CRSP.DAILYRTN_2002_04_06

CRSP.DAILYRTN_2002_07 CRSP.DAILYRTN_2002_08_09 CRSP.DAILYRTN_2002_10_12 CRSP.DAILYRTN_2003_01

CRSP.DAILYRTN_2003_02 CRSP.DAILYRTN_2003_03 CRSP.DAILYRTN_2003_04 CRSP.DAILYRTN_2003_05

CRSP.DAILYRTN_2003_06 CRSP.DAILYRTN_2003_07_09 CRSP.DAILYRTN_2003_10_12

CRSP.DAILYRTN_2004_01_03 CRSP.DAILYRTN_2004_04_06 CRSP.DAILYRTN_2004_07_09

CRSP.DAILYRTN_2004_10_12 CRSP.DAILYRTN_2005_01_03 CRSP.DAILYRTN_2005_04_06

CRSP.DAILYRTN_2005_07_09 CRSP.DAILYRTN_2005_10_12 CRSP.DAILYRTN_2006_01_03

CRSP.DAILYRTN_2006_04_06 CRSP.DAILYRTN_2006_07_09 CRSP.DAILYRTN_2006_10_12

CRSP.DAILYRTN_2007_01_03 CRSP.DAILYRTN_2007_04_06 CRSP.DAILYRTN_2007_07_09

CRSP.DAILYRTN_2007_10_12 CRSP.DAILYRTN_2008_01_03 CRSP.DAILYRTN_2008_04_06

CRSP.DAILYRTN_2008_07_09 CRSP.DAILYRTN_2008_10 CRSP.DAILYRTN_2008_11_12

CRSP.DAILYRTN_2009_01_03 CRSP.DAILYRTN_2009_04_06 CRSP.DAILYRTN_2009_07_09

CRSP.DAILYRTN_2009_10_12 CRSP.DAILYRTN_2010_01_03 CRSP.DAILYRTN_2010_04_06

CRSP.DAILYRTN_2010_07_09 CRSP.DAILYRTN_2010_10_12 CRSP.DAILYRTN_2011_01_03

CRSP.DAILYRTN_2011_04_06 CRSP.DAILYRTN_2011_07_09 CRSP.DAILYRTN_2011_10_12

126

127

128

129

130

131  data want2;

132

133     set &names;

NOTE: Data file CRSP.DAILYRTN_2002_02.DATA is in a format that is native to another host, or

      the file encoding does not match the session encoding. Cross Environment Data Access will

      be used, which might require additional CPU resources and might reduce performance.

NOTE: Data file CRSP.DAILYRTN_2002_03.DATA is in a format that is native to another host, or

      the file encoding does not match the session encoding. Cross Environment Data Access will

      be used, which might require additional CPU resources and might reduce performance.

NOTE: Data file CRSP.DAILYRTN_2002_04_06.DATA is in a format that is native to another host, or

      the file encoding does not match the session encoding. Cross Environment Data Access will

      be used, which might require additional CPU resources and might reduce performance.

NOTE: Data file CRSP.DAILYRTN_2002_07.DATA is in a format that is native to another host, or

      the file encoding does not match the session encoding. Cross Environment Data Access will

      be used, which might require additional CPU resources and might reduce performance.

NOTE: Data file CRSP.DAILYRTN_2002_08_09.DATA is in a format that is native to another host, or

      the file encoding does not match the session encoding. Cross Environment Data Access will

      be used, which might require additional CPU resources and might reduce performance.

NOTE: Data file CRSP.DAILYRTN_2002_10_12.DATA is in a format that is native to another host, or

      the file encoding does not match the session encoding. Cross Environment Data Access will

      be used, which might require additional CPU resources and might reduce performance.

134

135  run;

WARNING: Multiple lengths were specified for the variable DATE by input data set(s). This may

         cause truncation of data.

WARNING: Multiple lengths were specified for the variable NEXTDT by input data set(s). This may

         cause truncation of data.

WARNING: Multiple lengths were specified for the variable DLPDT by input data set(s). This may

         cause truncation of data.

WARNING: Multiple lengths were specified for the variable RCRDDT by input data set(s). This may

         cause truncation of data.

WARNING: Multiple lengths were specified for the variable PAYDT by input data set(s). This may

         cause truncation of data.

WARNING: Multiple lengths were specified for the variable VWRETD by input data set(s). This may

         cause truncation of data.

WARNING: Multiple lengths were specified for the variable VWRETX by input data set(s). This may

         cause truncation of data.

WARNING: Multiple lengths were specified for the variable EWRETD by input data set(s). This may

         cause truncation of data.

WARNING: Multiple lengths were specified for the variable EWRETX by input data set(s). This may

         cause truncation of data.

WARNING: Multiple lengths were specified for the variable SPRTRN by input data set(s). This may

         cause truncation of data.

NOTE: There were 159035 observations read from the data set CRSP.DAILYRTN_2002_01.

NOTE: There were 143067 observations read from the data set CRSP.DAILYRTN_2002_02.

NOTE: There were 149804 observations read from the data set CRSP.DAILYRTN_2002_03.

NOTE: There were 474570 observations read from the data set CRSP.DAILYRTN_2002_04_06.

NOTE: There were 161829 observations read from the data set CRSP.DAILYRTN_2002_07.

NOTE: There were 306133 observations read from the data set CRSP.DAILYRTN_2002_08_09.

NOTE: There were 462178 observations read from the data set CRSP.DAILYRTN_2002_10_12.

NOTE: There were 150397 observations read from the data set CRSP.DAILYRTN_2003_01.

NOTE: There were 135381 observations read from the data set CRSP.DAILYRTN_2003_02.

NOTE: There were 148643 observations read from the data set CRSP.DAILYRTN_2003_03.

NOTE: There were 147568 observations read from the data set CRSP.DAILYRTN_2003_04.

NOTE: There were 146879 observations read from the data set CRSP.DAILYRTN_2003_05.

NOTE: There were 146244 observations read from the data set CRSP.DAILYRTN_2003_06.

NOTE: There were 441909 observations read from the data set CRSP.DAILYRTN_2003_07_09.

NOTE: There were 439730 observations read from the data set CRSP.DAILYRTN_2003_10_12.

NOTE: There were 425097 observations read from the data set CRSP.DAILYRTN_2004_01_03.

NOTE: There were 424541 observations read from the data set CRSP.DAILYRTN_2004_04_06.

NOTE: There were 437966 observations read from the data set CRSP.DAILYRTN_2004_07_09.

NOTE: There were 438980 observations read from the data set CRSP.DAILYRTN_2004_10_12.

NOTE: There were 418163 observations read from the data set CRSP.DAILYRTN_2005_01_03.

NOTE: There were 438524 observations read from the data set CRSP.DAILYRTN_2005_04_06.

NOTE: There were 439167 observations read from the data set CRSP.DAILYRTN_2005_07_09.

NOTE: There were 431892 observations read from the data set CRSP.DAILYRTN_2005_10_12.

NOTE: There were 423364 observations read from the data set CRSP.DAILYRTN_2006_01_03.

NOTE: There were 430474 observations read from the data set CRSP.DAILYRTN_2006_04_06.

NOTE: There were 433051 observations read from the data set CRSP.DAILYRTN_2006_07_09.

NOTE: There were 435870 observations read from the data set CRSP.DAILYRTN_2006_10_12.

NOTE: There were 424165 observations read from the data set CRSP.DAILYRTN_2007_01_03.

NOTE: There were 441145 observations read from the data set CRSP.DAILYRTN_2007_04_06.

NOTE: There were 442105 observations read from the data set CRSP.DAILYRTN_2007_07_09.

NOTE: There were 450231 observations read from the data set CRSP.DAILYRTN_2007_10_12.

NOTE: There were 429142 observations read from the data set CRSP.DAILYRTN_2008_01_03.

NOTE: There were 447544 observations read from the data set CRSP.DAILYRTN_2008_04_06.

NOTE: There were 445165 observations read from the data set CRSP.DAILYRTN_2008_07_09.

NOTE: There were 158441 observations read from the data set CRSP.DAILYRTN_2008_10.

NOTE: There were 281067 observations read from the data set CRSP.DAILYRTN_2008_11_12.

NOTE: There were 412231 observations read from the data set CRSP.DAILYRTN_2009_01_03.

NOTE: There were 419137 observations read from the data set CRSP.DAILYRTN_2009_04_06.

NOTE: There were 421586 observations read from the data set CRSP.DAILYRTN_2009_07_09.

NOTE: There were 421584 observations read from the data set CRSP.DAILYRTN_2009_10_12.

NOTE: There were 401639 observations read from the data set CRSP.DAILYRTN_2010_01_03.

NOTE: There were 415637 observations read from the data set CRSP.DAILYRTN_2010_04_06.

NOTE: There were 422590 observations read from the data set CRSP.DAILYRTN_2010_07_09.

NOTE: There were 422837 observations read from the data set CRSP.DAILYRTN_2010_10_12.

NOTE: There were 411505 observations read from the data set CRSP.DAILYRTN_2011_01_03.

NOTE: There were 420917 observations read from the data set CRSP.DAILYRTN_2011_04_06.

NOTE: There were 429510 observations read from the data set CRSP.DAILYRTN_2011_07_09.

NOTE: There were 423344 observations read from the data set CRSP.DAILYRTN_2011_10_12.

NOTE: The data set WORK.WANT2 has 17331978 observations and 58 variables.

NOTE: DATA statement used (Total process time):

      real time           4:36.54

      cpu time            43.47 seconds

Frequent Contributor
Posts: 139

Combine data sets-which is the fastest method?

Hi it should be in quotes. "CRSP"

Super User
Posts: 3,257

Combine data sets-which is the fastest method?

Your datasets are not in a WORK library but in a library you reference with a LIBREF of CRSP, so you need to insert this into your dataset list as well:

proc sql noprint;

  select strip(libname) !! '.' !! strip(memname) into :names separated by ' '

     from dictionary.tables

             where libname='CRSP';

quit;

%put &names;

Contributor mei
Contributor
Posts: 62

Combine data sets-which is the fastest method?

Dear SASkiwi

this is the log:

do you think the note and warning in the log has any impact to the output?

NOTE: Data file CRSP.DAILYRTN_2002_02.DATA is in a format that is native to another host, or

      the file encoding does not match the session encoding. Cross Environment Data Access will

      be used, which might require additional CPU resources and might reduce performance.

NOTE: Data file CRSP.DAILYRTN_2002_03.DATA is in a format that is native to another host, or

      the file encoding does not match the session encoding. Cross Environment Data Access will

      be used, which might require additional CPU resources and might reduce performance.

NOTE: Data file CRSP.DAILYRTN_2002_04_06.DATA is in a format that is native to another host, or

      the file encoding does not match the session encoding. Cross Environment Data Access will

      be used, which might require additional CPU resources and might reduce performance.

NOTE: Data file CRSP.DAILYRTN_2002_07.DATA is in a format that is native to another host, or

      the file encoding does not match the session encoding. Cross Environment Data Access will

      be used, which might require additional CPU resources and might reduce performance.

NOTE: Data file CRSP.DAILYRTN_2002_08_09.DATA is in a format that is native to another host, or

      the file encoding does not match the session encoding. Cross Environment Data Access will

      be used, which might require additional CPU resources and might reduce performance.

NOTE: Data file CRSP.DAILYRTN_2002_10_12.DATA is in a format that is native to another host, or

      the file encoding does not match the session encoding. Cross Environment Data Access will

      be used, which might require additional CPU resources and might reduce performance.

113

114  run;

WARNING: Multiple lengths were specified for the variable DATE by input data set(s). This may

         cause truncation of data.

WARNING: Multiple lengths were specified for the variable NEXTDT by input data set(s). This may

         cause truncation of data.

WARNING: Multiple lengths were specified for the variable DLPDT by input data set(s). This may

         cause truncation of data.

WARNING: Multiple lengths were specified for the variable RCRDDT by input data set(s). This may

         cause truncation of data.

WARNING: Multiple lengths were specified for the variable PAYDT by input data set(s). This may

         cause truncation of data.

WARNING: Multiple lengths were specified for the variable VWRETD by input data set(s). This may

         cause truncation of data.

WARNING: Multiple lengths were specified for the variable VWRETX by input data set(s). This may

         cause truncation of data.

WARNING: Multiple lengths were specified for the variable EWRETD by input data set(s). This may

         cause truncation of data.

WARNING: Multiple lengths were specified for the variable EWRETX by input data set(s). This may

         cause truncation of data.

WARNING: Multiple lengths were specified for the variable SPRTRN by input data set(s). This may

         cause truncation of data.

NOTE: There were 159035 observations read from the data set CRSP.DAILYRTN_2002_01.

NOTE: There were 143067 observations read from the data set CRSP.DAILYRTN_2002_02.

NOTE: There were 149804 observations read from the data set CRSP.DAILYRTN_2002_03.

NOTE: There were 474570 observations read from the data set CRSP.DAILYRTN_2002_04_06.

NOTE: There were 161829 observations read from the data set CRSP.DAILYRTN_2002_07.

NOTE: There were 306133 observations read from the data set CRSP.DAILYRTN_2002_08_09.

NOTE: There were 462178 observations read from the data set CRSP.DAILYRTN_2002_10_12.

NOTE: There were 150397 observations read from the data set CRSP.DAILYRTN_2003_01.

NOTE: There were 135381 observations read from the data set CRSP.DAILYRTN_2003_02.

NOTE: There were 148643 observations read from the data set CRSP.DAILYRTN_2003_03.

NOTE: There were 147568 observations read from the data set CRSP.DAILYRTN_2003_04.

NOTE: There were 146879 observations read from the data set CRSP.DAILYRTN_2003_05.

NOTE: There were 146244 observations read from the data set CRSP.DAILYRTN_2003_06.

NOTE: There were 441909 observations read from the data set CRSP.DAILYRTN_2003_07_09.

NOTE: There were 439730 observations read from the data set CRSP.DAILYRTN_2003_10_12.

NOTE: There were 425097 observations read from the data set CRSP.DAILYRTN_2004_01_03.

NOTE: There were 424541 observations read from the data set CRSP.DAILYRTN_2004_04_06.

NOTE: There were 437966 observations read from the data set CRSP.DAILYRTN_2004_07_09.

NOTE: There were 438980 observations read from the data set CRSP.DAILYRTN_2004_10_12.

NOTE: There were 418163 observations read from the data set CRSP.DAILYRTN_2005_01_03.

NOTE: There were 438524 observations read from the data set CRSP.DAILYRTN_2005_04_06.

NOTE: There were 439167 observations read from the data set CRSP.DAILYRTN_2005_07_09.

NOTE: There were 431892 observations read from the data set CRSP.DAILYRTN_2005_10_12.

NOTE: There were 423364 observations read from the data set CRSP.DAILYRTN_2006_01_03.

NOTE: There were 430474 observations read from the data set CRSP.DAILYRTN_2006_04_06.

NOTE: There were 433051 observations read from the data set CRSP.DAILYRTN_2006_07_09.

NOTE: There were 435870 observations read from the data set CRSP.DAILYRTN_2006_10_12.

NOTE: There were 424165 observations read from the data set CRSP.DAILYRTN_2007_01_03.

NOTE: There were 441145 observations read from the data set CRSP.DAILYRTN_2007_04_06.

NOTE: There were 442105 observations read from the data set CRSP.DAILYRTN_2007_07_09.

NOTE: There were 450231 observations read from the data set CRSP.DAILYRTN_2007_10_12.

NOTE: There were 429142 observations read from the data set CRSP.DAILYRTN_2008_01_03.

NOTE: There were 447544 observations read from the data set CRSP.DAILYRTN_2008_04_06.

NOTE: There were 445165 observations read from the data set CRSP.DAILYRTN_2008_07_09.

NOTE: There were 158441 observations read from the data set CRSP.DAILYRTN_2008_10.

NOTE: There were 281067 observations read from the data set CRSP.DAILYRTN_2008_11_12.

NOTE: There were 412231 observations read from the data set CRSP.DAILYRTN_2009_01_03.

NOTE: There were 419137 observations read from the data set CRSP.DAILYRTN_2009_04_06.

NOTE: There were 421586 observations read from the data set CRSP.DAILYRTN_2009_07_09.

NOTE: There were 421584 observations read from the data set CRSP.DAILYRTN_2009_10_12.

NOTE: There were 401639 observations read from the data set CRSP.DAILYRTN_2010_01_03.

NOTE: There were 415637 observations read from the data set CRSP.DAILYRTN_2010_04_06.

NOTE: There were 422590 observations read from the data set CRSP.DAILYRTN_2010_07_09.

NOTE: There were 422837 observations read from the data set CRSP.DAILYRTN_2010_10_12.

NOTE: There were 411505 observations read from the data set CRSP.DAILYRTN_2011_01_03.

NOTE: There were 420917 observations read from the data set CRSP.DAILYRTN_2011_04_06.

NOTE: There were 429510 observations read from the data set CRSP.DAILYRTN_2011_07_09.

NOTE: There were 423344 observations read from the data set CRSP.DAILYRTN_2011_10_12.

NOTE: The data set WORK.WANT has 17331978 observations and 58 variables.

NOTE: DATA statement used (Total process time):

      real time           4:53.08

      cpu time            49.37 seconds

Frequent Contributor
Posts: 139

Combine data sets-which is the fastest method?

I guesss notes doesn't have any issues howevrr Warnings might.

Actually the columns specified in the warnings may have different lengths.you just need to check the output dataset and input datasets and make the input columns of same length.

Thanks

Contributor mei
Contributor
Posts: 62

Combine data sets-which is the fastest method?

Posted in reply to manojinpec

Thanks all!!!

I believe i have got the output i want. Smiley Happy
and this is just the first step of my analysis.

I guess i will post more q in the future:smileylaugh:..Thanks all in advance!!

mei

Respected Advisor
Posts: 4,173

Re: Combine data sets-which is the fastest method?

As the WARNING tells you truncation of values may occur.

What happens: The variable attributes (like the length) from the first data set in the list of source datasets are taken. So for example if your variable DATE in the first data set has a length of character 9 with a value like 28Mar2012 but in the second data set the same variable has a length of character 10 with a value like 28/03/2012 then the values from the second dataset will be truncated to 9 characters and in your result data set it will look like 28/03/200.


So yes, it IS an issue and you will need some additional coding (basically query dictionary.columns and create an attrib statement with the max length values per variable).


proc sql noprint;

  select strip(libname) !! '.' !! strip(memname) into :names separated by ' '
     from dictionary.tables
     where libname='CRSP';

  select
    strip(name)||' '||case(type) when('char') then cats('$',length) else cats(length) end into :maxlengths separated by ' '
  from dictionary.columns
  where libname='CRSP' and memname like "DAILYRTN_20%"

  group by name
  having max(length)=length
  ;

quit;
%put names= &names;
%put maxlengths= &maxlengths;

data want;
  length &maxlengths;
  set &names;
run;

Ask a Question
Discussion stats
  • 16 replies
  • 553 views
  • 8 likes
  • 6 in conversation