BookmarkSubscribeRSS Feed
DavidD
Calcite | Level 5
Yo anybody out there . . . I am using SAS EG (new user) and have had Base SAS courses as well as EG but alas, I am back here at my workstation and have an issue. I have got a data set with 2 million records. My pal in another department has given me a seperate data set that lists 19 thousand of those very same records that should not be in my data set. So I have to remove the 19K that all have unique reciept numbers. Should I create a SAS program to do this? Or is there something in EG to do this. If I write a SAS program do I subset the data and then delete? Any ideas to give . . ???
8 REPLIES 8
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
It's unclear whether you have 'external' files or SAS files? If you have SAS files and the two are identical, you should be able to combine the two files and remove any duplicates. If, however, the two SAS files are not truly identical with all columns, you will need to consider identifying a set of "key" variables to combine the two files and then remove the duplicates.

If you have 'externa' files, you will need to import the two files, and then perform the process listed above.

Scott Barry
SBBWorks, Inc.
DavidD
Calcite | Level 5
Thank you. They are both in SAS files now but I also want to delete the record and thus sanitize the data base of these records.
LinusH
Tourmaline | Level 20
On way of doing this (depends on your data) is to use a subquery in a SQL where expression:

...
where your_table.key_col not in(select other_table.keycol from other_table)
;
quit;

Can't see anyway of doing this the point-and-click way, at least in EG 4.1. Bot you could code in the advanced expression builder in the Filter and query object.

/Linus
Data never sleeps
ChrisHemedinger
Community Manager
You can do it in the query builder, I think, if I understand the question correctly.

1. Add the two tables into the query, T1 (full set) and T2 (to exclude). Join on the ID (receipt number).

2. Change the Join type to Left Join (assuming the Big table is first, and "exclude" table is second). This considers ALL rows from T1.

3. Add filter where T2.ID IS MISSING.

Resulting table will have all of the records from T1 that don't have a match in T2.

Chris
It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
DavidD
Calcite | Level 5
Best one I have heard!! Thank you!!
DavidD
Calcite | Level 5
You know when I put in T2.ID is missing in the actual case its where t2.Receipt# is missing it does not like the syntax. Is this an SQL problem where I need parens or brackets? Thanx
ChrisHemedinger
Community Manager
Ooh, the column name is "Receipt#" ? With the hash symbol?

The query builder is *supposed* to build that as something like

T1."Receipt#"n

The quote-quote-n is the literal syntax to allow SAS to process column names that don't follow the traditional naming rules.

As long as OPTIONS VALIDVARNAME=ANY is in effect, which EG does set by default.

Chris
It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
RichardH_sas
SAS Employee
Chris' solution is definitely the best point-and-click one I know of. FYI, if you wanted to write some code instead, there is a set operator EXCEPT that's part of SQL and is designed to do exactly what you describe: rows in one data set that are not in the other.

proc sql;
create table work.modified as
select id, name
from work.large
EXCEPT
select id, name
from work.small;
quit;

The new table would contain the columns ID and NAME, and only unique rows from the first data set (work.large) not found in the second data set (work.small).

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 857 views
  • 0 likes
  • 5 in conversation