BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
grande44
Calcite | Level 5

Hello all! SAS noob here. I have a dataset with thousands of observations. There are quite a few variables, e.g. Group, Subgroup, Title. There is one variable called the "Code". I have a list of about 1000 codes that I want to include in the dataset. I would like to create a new dataset in which all the "undesirable" observations (e.g. the observations whose codes are NOT on my list) are deleted. How can I go about this?

Thank you so much! Any help or tips would be greatly appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

A simple SQL query will do that

proc sql;

create table validData as

select * from myData

where Code in (select Code from myCodes);

quit;

PG

PG

View solution in original post

4 REPLIES 4
PGStats
Opal | Level 21

A simple SQL query will do that

proc sql;

create table validData as

select * from myData

where Code in (select Code from myCodes);

quit;

PG

PG
grande44
Calcite | Level 5

Hello! Thanks so much for replying. This is what the SAS log says when I run your proc sql code:

6852  proc sql;

6853  create table validData as

6854  select * from LIBRARY.NOCTRY4

6855  where CODE in (select CODE from NOC75);

NOTE: Table WORK.VALIDDATA created, with 1614 rows and 13 columns.


6856  quit;


NOTE: PROCEDURE SQL used (Total process time):

      real time           0.04 seconds

      cpu time            0.03 seconds

The only problem is that the dataset NOC75 (which is the list of codes I want to include) has only 1106 observations, but the new table "validData" has 1614 observations. The original data set, NOCTRY4, has 1666 observations, so I can infer that not all of the undesirable NOCs were not deleted.

sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10

Did you actually verify your input and output results to confirm the inference?  Possibly there are/were duplicates?  Consider using PROC FREQ to analyze both files to confirm as desk-checking the results, based on some known input.      

PGStats
Opal | Level 21

There must exist many records in LIBRARY.NOCTRY4 that share the same CODE. To find them out, try running :

proc sql;

create table duplicates as

select *, count(CODE) as n 

from LIBRARY.NOCTRY4

group by CODE

having count(CODE) > 1;

quit;

PG

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 870 views
  • 1 like
  • 3 in conversation