Curious how SAS could efficiently do the following:
Given two text lists (say, of stock symbols) how can user subtract one list from the other?
Example:
AAAU
ACIM
ACWF
ACWX
ADRD
ADRE
ADRU
AGG
minus
ACIM
ACWF
AGG
We assume all members of List Two are included in List One.
From reading other's posts, Proc Compare and Proc SQL both look promising.
Any suggestions appreciated.
Nicholas Kormanik
Or:
proc sql;
delete from list1
where stock in (select stock from list2);
quit;
If the expected output is to have the list of text not in list2 but in list 1 the try the below code
data have1;
input list$;
cards;
AAAU
ACIM
ACWF
ACWX
ADRD
ADRE
ADRU
AGG
;
data have2;
input list$;
cards;
ACIM
ACWF
AGG
;
proc sort data=have1;
by list;
run;
proc sort data=have2;
by list;
run;
data want;
merge have1(in=a) have2(in=b);
by list;
if a and not b;
run;
proc sql;
create table list3 as
select stock from list1
except
select stock from list2;
quit;
It can be done without explicit sorting using Hash objects. Also we can use Proc Format. Here is the Hash way using the data sets given by Jag:
data want;
if _n_ = 1 then do;
if 0 then set have2;
declare hash h(dataset:'have2');
h.definekey('list');
h.definedone();
end;
set have1;
if h.check() ne 0;
run;
Or:
proc sql;
delete from list1
where stock in (select stock from list2);
quit;
No mention of Proc Compare yet? Most examples out there that seemed reasonably close to my question used Proc Compare.
PGStats looks to be offering short and efficient code. Nice tight quick solution.
Can anyone do better??
This is similar to @PGStats's "delete from" approach in that it removes observations from LIST1:
data list1;
modify list1 list2;
by stock;
remove;
run;
Like PROC SQL it does not require sorted input datasets -- despite the BY statement.
If LIST1 and LIST2 are the only datasets (in the WORK/USER library) whose names start with "LIST", the MODIFY statement could be written as
modify list:;
Just for the record, here's my full code:
/*
Help given by PGStats.
Remove outdated, depricated stock symbols
from entire symbol list.
Use MetaStock tester to isolate the bad symbols.
Copy full symbol list to C:\0_SAS_1.
Name it ... as appropriate.
Copy list of bad symbols to be removed
to C:\0_SAS_1. Name it "Remove.txt".
*/
data sas_1.ETF_List;
infile "C:\0_SAS_1\ETF List.txt";
input Stocks $;
run;
data sas_1.Remove;
infile "C:\0_SAS_1\Remove.txt";
input Stocks $;
run;
proc sql;
delete from SAS_1.ETF_List
where Stocks in (select Stocks from SAS_1.Remove);
quit;
proc export
data=SAS_1.ETF_List
outfile="C:\0_SAS_1\ETF List 2.txt";
putnames=no;
run;
Hi @NKormanik
If have2 is very small compared to have1, using a hash will be in an efficient way to achieve you goal.
There is no need to sort those tables or to index them prior to do that.
Best,
data have1;
input list$;
cards;
AAAU
ACIM
ACWF
ACWX
ADRD
ADRE
ADRU
AGG
;
run;
data have2;
input list$;
cards;
ACIM
ACWF
AGG
;
run;
data want;
if _n_ = 1 then do;
declare hash h (dataset:'have2');
h.definekey('list');
h.definedone();
end;
set have1;
if h.find() ne 0;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.