BookmarkSubscribeRSS Feed
paulrockliffe
Obsidian | Level 7

I have a complicated setup with some data I need to pull into Enterprise Guide.  I basically get a new table created on the server each month by a process, it contains all the information created in just that month.  There is information there that relates to a huge number of entities, I'm interested in a subset of those entities only.

 

I need to get that subset of information for the 60 latest tables.  There's a lot of data so I can't practically pull the data from 60 tables and then filter it, I need to only pull the data I need in the first place.

 

Each entity has an ID, I have a table in EG that lists the IDs I'm interested in.

 

I have used Excel to generate SQL that will combine the tables I'm interested in, but I'm stuck when trying to filter the data at the same time.  The SQL I have is in this format:

 

select * from TABLE.MONTH1 union all select * from TABLE.MONTH2 union all select * from TABLE.MONTH3 etc etc etc

 

How can I modify the query so that it only takes values where ID is in WORK.IDs?

 

Thanks!

9 REPLIES 9
TomKari
Onyx | Level 15

Just to make sure we understand what you're trying to do:

 

The following code is kind of a simulation of what I think you're after, with cars taking the place of your tables, and horsepower taking the place of ID.

 

Is this the situation you're trying to create?

 

Tom

 

data cars1; set sashelp.cars;
data cars2; set sashelp.cars;
data cars3; set sashelp.cars;
run;

proc sql noprint;
	create table Want as
		select * from Work.Cars1 where Horsepower in(225, 270) union all 
		select * from Work.Cars2 where Horsepower in(225, 270) union all
		select * from Work.Cars3 where Horsepower in(225, 270);
quit;
paulrockliffe
Obsidian | Level 7

Hello, yes that's what I'm aiming for, except the horsepower values are dynamic and contained within another table.  There are about 30,000 horsepower values, so I can't do this in any way that involves typing them out.

 

The cars names follow a pattern, linked to the month they relate to, so they can't be typed either, but I was initially getting around that by using a hack in Excel to create that SQL. 

 

I'm happy to carry on hacking in Excel, if it's possible to do the filter entirely in proc sql, I've just hit a brick wall when I realised I couldn't extract all the tables and join them before I filter them because of the volume of data.

TomKari
Onyx | Level 15

Ouch, that's ugly! My original plan was to drop the ID values into a macro variable, and just use that in the "in" clause. But 30,000 entries is too many for that approach. I'll put my thinking cap on again.

 

Tom

TomKari
Onyx | Level 15

Okay, didn't work out as badly as I was afraid it would.

 

Is the following practical for what you need? If this works in concept, setting up a macro %do loop to handle the multiple tables is pretty easy.

 

Tom

 

data IDTable;
	input ID;
	cards;
225
270
run;

/* Load these values to your database */

/*
proc sql noprint;
create table mydb.IDTable as
select ID from IDTable;
quit;
*/
data cars1;
	set sashelp.cars(rename=(Horsepower = ID));

data cars2;
	set sashelp.cars(rename=(Horsepower = ID));

data cars3;
	set sashelp.cars(rename=(Horsepower = ID));
run;

/* use mydb.IDTable instead of work.IDTable */
proc sql noprint;
	create table Want as
		select t1.* from Work.Cars1 t1 inner join work.IDTable m1 on(t1.ID = m1.ID) union all
		select t2.* from Work.Cars2 t2 inner join work.IDTable m2 on(t2.ID = m2.ID) union all
		select t3.* from Work.Cars3 t3 inner join work.IDTable m3 on(t3.ID = m3.ID);
quit;
Patrick
Opal | Level 21

@paulrockliffe 

In case your source data "on the server" is not stored in a database but is a set of monthly SAS tables then below code should do the job.

data IDTable;
  input ID;
  cards;
225
270
;
run;

data cars_201812 cars_201901 cars_201902 cars_201903;
  set sashelp.cars(rename=(Horsepower = ID));
run;

data want;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'IDTable');
      h1.defineKey('id');
      h1.defineDone();
    end;
  set cars_201901 - cars_201903;
  if h1.check()=0 then output;
run;

If you also provide us with the naming pattern of these monthly tables then we can show you how to dynamically select the desired tables based on current date.

There is likely no need to use Excel to generate code. The SAS Macro language allows you to do this.

LNEW
Obsidian | Level 7

Here's a thought:

Proc sql;

create table first_set_only as

select var1, var2 from dsn1 where id in (select id from work.ids); quit;

%let set_list = dsn2 dsn3...dsn60;

%macro app;

do i = 1 %to 59;

 

Proc sql;

create table &set_list._1 as

select var1, var2 from &set_list where id in (select id from work.ids); quit;

 

proc append base = first_set_only data = &set_list._1 force;

proc append base = first_set_only data = &set_list._1 force;

proc append base = first_set_only date = &set_list._1 force;

%end;

 

%mend;

%app;

 

paulrockliffe
Obsidian | Level 7

Thanks for all the help and sorry I'm picking this up months down the line.  This ended up being less important than thought so I was bodging it by asking someone to extract using SQL access rather than EG. 

 

IT have decided to bring in a new system and not migrate data, so I'm now having to achieve this across two data sources and can't use SQL for the second, so it's back up the list of things I need to get fixed again!

 

I think I can get somewhere with the answers so far, the bit I'm not sure about is setting the data range that is being harvested.  The format of the monthly tables is f_ret199601, f_ret199602, etc, where f_ret is meaningless, 1996 is the year and 01 is the first month of the tax year, so 01 = April.

 

Ideally what I'd like to do is to set a date range using Prompts and then use those to identify all the monthly returns in the period and return a table that is amended.

 

Thanks again!

Patrick
Opal | Level 21

I suggest you ask a new question. Be very clear what you have and what you need. Also provide some representative sample data.

 

"so I'm now having to achieve this across two data sources and can't use SQL for the second,"

Why not? Just be clear where the data lives (database tables, SAS table, external file), the volumes you're dealing with and where you need the result.

 

It's not that hard to generate SAS/SQL code using SAS and it's also not that hard to generate the table names you need. But we do need very clear information from you and ideally also some sample data (provided via working SAS data steps) so we can provide tested code.

 

paulrockliffe
Obsidian | Level 7

Sorry, that wasn't clear was it.  The second source just isn't accessible from anywhere other than EG, so when that starts holding this data I'll need a non-SQL solution.  So I might as well sort it for the first data source now so I'm ahead of the curve.

 

So the second data source is irrelevant for now, but I could do with a working answer to my original question.  I can then adapt it to work on the second data source when I have some spare time and merge the results of both so data flows seamlessly when the new system is live. Saves me keeping an eye on it and rushing at the last minute to fix it.

 

Anyway, the tables sit in a Library on my SAS Server, Server is "SASServer", Library is "Data", then the tables are as I described.

 

I want my result to be created on the same server in SASUSER.TableName.  

 

In terms of the volume of data, each monthly table contains data for several million entities, but my lookup table is a subset of a few thousand entities.  So there's a need to filter as I go rather than bringing the whole lot in to one table before filtering.

 

Does that make sense?

 

Thanks!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 7635 views
  • 1 like
  • 4 in conversation