05-22-2015 12:09 PM
Hello all! SAS noob here. I have a dataset with thousands of observations. There are quite a few variables, e.g. Group, Subgroup, Title. There is one variable called the "Code". I have a list of about 1000 codes that I want to include in the dataset. I would like to create a new dataset in which all the "undesirable" observations (e.g. the observations whose codes are NOT on my list) are deleted. How can I go about this?
Thank you so much! Any help or tips would be greatly appreciated.
05-22-2015 12:59 PM
Hello! Thanks so much for replying. This is what the SAS log says when I run your proc sql code:
6852 proc sql;
6853 create table validData as
6854 select * from LIBRARY.NOCTRY4
6855 where CODE in (select CODE from NOC75);
NOTE: Table WORK.VALIDDATA created, with 1614 rows and 13 columns.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.04 seconds
cpu time 0.03 seconds
The only problem is that the dataset NOC75 (which is the list of codes I want to include) has only 1106 observations, but the new table "validData" has 1614 observations. The original data set, NOCTRY4, has 1666 observations, so I can infer that not all of the undesirable NOCs were not deleted.
05-22-2015 03:06 PM
Did you actually verify your input and output results to confirm the inference? Possibly there are/were duplicates? Consider using PROC FREQ to analyze both files to confirm as desk-checking the results, based on some known input.
05-22-2015 03:38 PM
There must exist many records in LIBRARY.NOCTRY4 that share the same CODE. To find them out, try running :
create table duplicates as
select *, count(CODE) as n
group by CODE
having count(CODE) > 1;
Need further help from the community? Please ask a new question.