BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sas_novice2
Calcite | Level 5

I have 60 sas7bdat datasets, around 100 million rows each and 60-70 columns, which contains quite similar columns and would like to process them altogether to filter and create a smaller dataset (containing all the columns) containing rows with specific information with the following code:

 

PROC SQL;

    CREATE TABLE bigtable AS

    SELECT * FROM DATA1

    OUTER UNION CORR

    SELECT * FROM DATA2

    .......

    OUTER UNION CORR

    SELECT * FROM DATA60;

 

    CREATE TABLE filtered AS

    SELECT * FROM bigtable

    WHERE <filtering based on 6 columns in the 'bigtable' with AND & OR>;

 

    <and then save 'filtered' in CSV>;

 

 

 

Two questions relating this process:

  1. DATA1, ..., DATA60 are sas7bdat datasets, but I get the following messages: "Data file is in a format that is native to another host". This may cause processing time to be much slower - datasets are in sas7bdat and in latin1 western (ISO) - any solutions, and why the data is not native formatted even when they're sas7bdat datasets?
  2. Is there a much faster way to process data via PROC SQL, to vertically combine and filter billions of rows?

Thank you in advance!

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star

By not filtering as you are reading your input datasets you are reading more data than necessary so your program will be slower. If you add a WHERE clause in my example code it applies equally to all datasets:

data want;
  set data1 - data60;
  where < your selection logic >;
run;

View solution in original post

4 REPLIES 4
SASKiwi
PROC Star

Where did the SAS datasets get created? You will get that message if the datasets were created in a SAS installation that is not identical to your one. For example if it runs on a different OS.

 

I'd suggest that you try using the DATA step SET statement to stack your tables as that will likely be faster than SQL. Also should filter the DATA1 to DATA60 tables if possible rather than reading all data then filtering afterwards:

data want;
  set data1 - data60;
run;
sas_novice2
Calcite | Level 5

Thank you for the response! Wouldn't filtering the dataset one by one for all 60 of them result in at least the same time spent than filtering them altogether? Filtering all the datasets together in a single step would require the filter to only traverse once, as opposed to traverse 60 times; although the same amount of rows explored - but have to re-type the filtering code for 60 times?

 

Can I use that DATA step in conjunction with PROC SQL? So after your code, using PROC SQL to filter the rows?

 

SASKiwi
PROC Star

By not filtering as you are reading your input datasets you are reading more data than necessary so your program will be slower. If you add a WHERE clause in my example code it applies equally to all datasets:

data want;
  set data1 - data60;
  where < your selection logic >;
run;
Tom
Super User Tom
Super User

There is nothing in your posted code that requires using SQL.

Although I have no idea what you mean by this phrase "filtering based on 6 columns in the 'bigtable' "

Are you trying to imply that some of the 6 variables do not exist in all of the original datasets?

 

One method that can be faster is to use PROC APPEND.

proc append data=data1 base=BIGTABLE ;
      WHERE <filtering based on 6 VARIABLES in the SMALL DATASET with AND & OR>;
run;

proc append data=data2 base=BIGTABLE ;
      WHERE <filtering based on 6 VARIABLES in the SMALL DATASET with AND & OR>;
run;
 ...

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 844 views
  • 1 like
  • 3 in conversation