DATA Step, Macro, Functions and more

Removing certain observations

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 16
Accepted Solution

Removing certain observations

Hello all! SAS noob here. I have a dataset with thousands of observations. There are quite a few variables, e.g. Group, Subgroup, Title. There is one variable called the "Code". I have a list of about 1000 codes that I want to include in the dataset. I would like to create a new dataset in which all the "undesirable" observations (e.g. the observations whose codes are NOT on my list) are deleted. How can I go about this?

Thank you so much! Any help or tips would be greatly appreciated.


Accepted Solutions
Solution
‎05-22-2015 12:25 PM
Respected Advisor
Posts: 4,927

Re: Removing certain observations

A simple SQL query will do that

proc sql;

create table validData as

select * from myData

where Code in (select Code from myCodes);

quit;

PG

PG

View solution in original post


All Replies
Solution
‎05-22-2015 12:25 PM
Respected Advisor
Posts: 4,927

Re: Removing certain observations

A simple SQL query will do that

proc sql;

create table validData as

select * from myData

where Code in (select Code from myCodes);

quit;

PG

PG
Occasional Contributor
Posts: 16

Re: Removing certain observations

Hello! Thanks so much for replying. This is what the SAS log says when I run your proc sql code:

6852  proc sql;

6853  create table validData as

6854  select * from LIBRARY.NOCTRY4

6855  where CODE in (select CODE from NOC75);

NOTE: Table WORK.VALIDDATA created, with 1614 rows and 13 columns.


6856  quit;


NOTE: PROCEDURE SQL used (Total process time):

      real time           0.04 seconds

      cpu time            0.03 seconds

The only problem is that the dataset NOC75 (which is the list of codes I want to include) has only 1106 observations, but the new table "validData" has 1614 observations. The original data set, NOCTRY4, has 1666 observations, so I can infer that not all of the undesirable NOCs were not deleted.

Super Contributor
Super Contributor
Posts: 3,174

Re: Removing certain observations

Did you actually verify your input and output results to confirm the inference?  Possibly there are/were duplicates?  Consider using PROC FREQ to analyze both files to confirm as desk-checking the results, based on some known input.      

Respected Advisor
Posts: 4,927

Re: Removing certain observations

There must exist many records in LIBRARY.NOCTRY4 that share the same CODE. To find them out, try running :

proc sql;

create table duplicates as

select *, count(CODE) as n 

from LIBRARY.NOCTRY4

group by CODE

having count(CODE) > 1;

quit;

PG

PG
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 261 views
  • 1 like
  • 3 in conversation