Solved: Re: (List One) minus (List Two)

NKormanik · Posted 12-11-2019 09:20 PM

Curious how SAS could efficiently do the following:

Given two text lists (say, of stock symbols) how can user subtract one list from the other?

Example:

AAAU

ACIM

ACWF

ACWX

ADRD

ADRE

ADRU

AGG

minus

ACIM

ACWF

AGG

We assume all members of List Two are included in List One.

From reading other's posts, Proc Compare and Proc SQL both look promising.

Any suggestions appreciated.

Nicholas Kormanik

PGStats · Posted 12-11-2019 10:53 PM

Or:

proc sql;
delete from list1
where stock in (select stock from list2);
quit;

PG

View solution in original post

Jagadishkatam · Posted 12-11-2019 09:31 PM

Please let us know if you are trying to identify the text not in either of the both list.

Thanks,
Jag

Jagadishkatam · Posted 12-11-2019 09:52 PM

If the expected output is to have the list of text not in list2 but in list 1 the try the below code

data have1;
input list$;
cards;
AAAU
ACIM
ACWF
ACWX
ADRD
ADRE
ADRU
AGG
;

data have2;
input list$;
cards;
ACIM
ACWF
AGG
;

proc sort data=have1;
by list;
run;

proc sort data=have2;
by list;
run;

data want;
merge have1(in=a) have2(in=b);
by list;
if a and not b;
run;

Thanks,
Jag

PGStats · Posted 12-11-2019 10:50 PM

proc sql;
create table list3 as
select stock from list1
except
select stock from list2;
quit;

PG

KachiM · Posted 12-11-2019 10:51 PM

It can be done without explicit sorting using Hash objects. Also we can use Proc Format. Here is the Hash way using the data sets given by Jag:

data want;
   if _n_ = 1 then do;
      if 0 then set have2;
      declare hash h(dataset:'have2');
      h.definekey('list');
      h.definedone();
   end;
   set have1;
   if h.check() ne 0;
run;

PGStats · Posted 12-11-2019 10:53 PM

Or:

proc sql;
delete from list1
where stock in (select stock from list2);
quit;

PG

NKormanik · Posted 12-12-2019 01:34 AM

No mention of Proc Compare yet? Most examples out there that seemed reasonably close to my question used Proc Compare.

PGStats looks to be offering short and efficient code. Nice tight quick solution.

Can anyone do better??

FreelanceReinh · Posted 12-12-2019 05:07 AM

This is similar to @PGStats's "delete from" approach in that it removes observations from LIST1:

data list1;
modify list1 list2;
by stock;
remove;
run;

Like PROC SQL it does not require sorted input datasets -- despite the BY statement.

If LIST1 and LIST2 are the only datasets (in the WORK/USER library) whose names start with "LIST", the MODIFY statement could be written as

modify list:;

NKormanik · Posted 12-13-2019 03:43 AM

Just for the record, here's my full code:


/*
Help given by PGStats.
Remove outdated, depricated stock symbols
 from entire symbol list.
Use MetaStock tester to isolate the bad symbols.
Copy full symbol list to C:\0_SAS_1.  
Name it ... as appropriate.
Copy list of bad symbols to be removed
 to C:\0_SAS_1.  Name it "Remove.txt".
*/

data sas_1.ETF_List;
infile "C:\0_SAS_1\ETF List.txt";
input Stocks $;
run;

data sas_1.Remove;
infile "C:\0_SAS_1\Remove.txt";
input Stocks $;
run;

proc sql;
delete from SAS_1.ETF_List
where Stocks in (select Stocks from SAS_1.Remove);
quit;

proc export
data=SAS_1.ETF_List
outfile="C:\0_SAS_1\ETF List 2.txt";
putnames=no;
run;

ed_sas_member · Posted 12-12-2019 03:37 AM

Hi @NKormanik

If have2 is very small compared to have1, using a hash will be in an efficient way to achieve you goal.

There is no need to sort those tables or to index them prior to do that.

Best,

data have1;
input list$;
cards;
AAAU
ACIM
ACWF
ACWX
ADRD
ADRE
ADRU
AGG
;
run;

data have2;
input list$;
cards;
ACIM
ACWF
AGG
;
run;

data want;
	if _n_ = 1 then do;
			declare hash h (dataset:'have2');
			h.definekey('list');
			h.definedone();
		end;
	set have1;
	if h.find() ne 0;
run;

Registration is open

SAS Training: Just a Click Away