Hello all! SAS noob here. I have a dataset with thousands of observations. There are quite a few variables, e.g. Group, Subgroup, Title. There is one variable called the "Code". I have a list of about 1000 codes that I want to include in the dataset. I would like to create a new dataset in which all the "undesirable" observations (e.g. the observations whose codes are NOT on my list) are deleted. How can I go about this?
Thank you so much! Any help or tips would be greatly appreciated.
A simple SQL query will do that
proc sql;
create table validData as
select * from myData
where Code in (select Code from myCodes);
quit;
PG
A simple SQL query will do that
proc sql;
create table validData as
select * from myData
where Code in (select Code from myCodes);
quit;
PG
Hello! Thanks so much for replying. This is what the SAS log says when I run your proc sql code:
6852 proc sql;
6853 create table validData as
6854 select * from LIBRARY.NOCTRY4
6855 where CODE in (select CODE from NOC75);
NOTE: Table WORK.VALIDDATA created, with 1614 rows and 13 columns.
6856 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.04 seconds
cpu time 0.03 seconds
The only problem is that the dataset NOC75 (which is the list of codes I want to include) has only 1106 observations, but the new table "validData" has 1614 observations. The original data set, NOCTRY4, has 1666 observations, so I can infer that not all of the undesirable NOCs were not deleted.
Did you actually verify your input and output results to confirm the inference? Possibly there are/were duplicates? Consider using PROC FREQ to analyze both files to confirm as desk-checking the results, based on some known input.
There must exist many records in LIBRARY.NOCTRY4 that share the same CODE. To find them out, try running :
proc sql;
create table duplicates as
select *, count(CODE) as n
from LIBRARY.NOCTRY4
group by CODE
having count(CODE) > 1;
quit;
PG
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.