BookmarkSubscribeRSS Feed
gabrielstraliot
Calcite | Level 5

Hello, i need to do a DATA STEP filtering data from another table.

This example table contains the variables that I use for the filter:

DATA WORK.TEAMS;
SET SASHELP.BASEBALL (KEEP=TEAM WHERE=(TEAM IN ('Cleveland', 'Atlanta', 'Boston')));
RUN;


I'm currently using select sql, however, it doesn't perform well. I want to use data step for this.

PROC SQL;
CREATE TABLE WORK.LIST_SQL AS
SELECT
t1.Name,
t1.Team,
t1.League,
t1.Division,
t1.Position
FROM SASHELP.BASEBALL T1
INNER JOIN WORK.TEAMS T2 ON T1.TEAM = T2.TEAM
;QUIT;

I can't use a variable like this below because the values i have ​​are dynamic

%LET TEAM = 'Cleveland', 'Atlanta', 'Boston';

DATA WORK.LIST_SAS;
SET SASHELP.BASEBALL
(WHERE=(TEAM IN (&TEAM.)));
RUN;

 

For context, I want to filter financial transactions from codes, however, these codes vary from month to month.

 

I can transform the data of this table (TEAMS) into a variable? what is the best alternative?

3 REPLIES 3
ballardw
Super User

When you say "codes vary from month to month" do you mean that the codes in the data change or the codes that you want to use change?

Also from "month to month" brings up a question of are you using all the codes for a given month? (We do not have your data or your experience so we do not know if this condition holds.) If so it might be easier to specify dates than a bunch of codes.

 

I think your basic join is inefficient. Filter first instead of applying a where after the join. A left join can select just the records you want a bit quicker I think. Please see:

data teamfilter;
   input team $14.;
datalines;
Cleveland
Atlanta
Boston
;

PROC SQL;
   CREATE TABLE WORK.LIST_SQL AS
   SELECT
   t1.Name,
   t1.Team,
   t1.League,
   t1.Division,
   t1.Position
   FROM work.teamfilter as a
        left join
        SASHELP.BASEBALL T1
       on T1.TEAM = A.TEAM
;QUIT;

How many codes are you looking at? It may be easier to place a list in a data set as shown than to write some sort of moderately ugly macro code to substitute.

 

By dynamic do you mean this list is generated elsewhere in data or code? If so, show us how that goes because replacing a macro variable with a list is about the only other simple approach.

 

gabrielstraliot
Calcite | Level 5

I simulated the data to better exemplify:

 

DATA WORK.Transactions;

INFILE DATALINES dlm = ",";
INPUT ACCOUNT $ TYPE $ VALUE DATE :ddmmyy10.;
format DATE :ddmmyy10.;

DATALINES;
00001-1,C,20,14/06/2022
00002-2,D,1,20/06/2022
00003-3,D,28,16/05/2022
00001-2,D,100,04/07/2022
00002-3,D,50,11/07/2022
00003-4,C,50,27/06/2022
00001-3,C,30,18/07/2022
00001-1,D,500,09/05/2022
00002-2,C,10000,14/06/2022
00003-3,D,10000,01/08/2022
00001-2,C,10,16/05/2022
00002-3,D,12,14/06/2022
00003-4,C,90,06/06/2022
00001-3,D,2,04/07/2022
00001-1,D,20,14/06/2022
00002-2,D,1,18/07/2022
00003-3,C,28,14/06/2022
00001-2,D,100,09/05/2022
00002-3,C,50,11/07/2022
00003-4,D,50,18/07/2022
00001-3,C,30,02/05/2022
00001-1,D,500,25/07/2022
00002-2,C,10000,30/05/2022
00003-3,C,10000,25/07/2022
00001-2,C,10,01/08/2022
00002-3,D,12,14/06/2022
00003-4,C,90,25/07/2022
00001-3,D,2,30/05/2022
;RUN;

 

DATA WORK.ACCOUNTS;
INFILE DATALINES dlm = ",";

INPUT NAME $ ACCOUNT $ MONTH :ddmmyy10. SELECT $;
format MONTH MMYYS.;

DATALINES;
Jacoby Brook,00001-1,01/06/22,Y
Sample Billy,00002-2,01/06/22,N
Bando Chris,00003-3,01/06/22,N
Castillo Carmen,00001-2,01/06/22,Y
Hall Mel,00003-4,01/06/22,N
Barrett Marty,00002-3,01/07/22,N
Hall Mel,00003-4,01/07/22,Y
Moreno Omar,00001-3,01/07/22,Y
Jacoby Brook,00001-1,01/07/22,N
;RUN;

 

This is an example of how I do it. I need to separate transactions. Thanks for the LEFT JOIN suggestion. But can I do it by DATA STEP to perform better
?

 

%LET DATAI = %SYSFUNC(INTNX(MONTH, %SYSFUNC(TODAY()), -1, B), DATE9.);
%LET DATAF = %SYSFUNC(INTNX(MONTH, %SYSFUNC(TODAY()), -1, E), DATE9.);


PROC SQL;
CREATE TABLE WORK.FINAL_LIST_SQL AS
SELECT t1.ACCOUNT,
t2.NAME,
t1.TYPE,
t1.VALUE,
t1.DATE
FROM WORK.TRANSACTIONS t1
LEFT JOIN WORK.ACCOUNTS T2 ON t1.ACCOUNT = t2.ACCOUNT
WHERE T1.DATE BETWEEN "&DATAI."D AND "&DATAF."D AND T2.SELECT = 'Y' AND T2.MONTH BETWEEN "&DATAI."D AND "&DATAF."D
;QUIT;

 

 

Kurt_Bremser
Super User

Filtering for values contained in another dataset is best done with a hash object.

Say you have two datasets "have" and "lookup", both containing a variable "key".

data want;
set have;
if _n_ =1
then do;
  declare hash l (dataset:"lookup");
  l.definekey("key");
  l.definedone();
end;
if l.check() = 0;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 4218 views
  • 1 like
  • 3 in conversation