BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
anoopm7
Calcite | Level 5

I need to find out if some specific numeric values are there in a dataset (have millions of records) for a particular field lets say 'city'. One way of doing this is to do a freq on that variable (called as city) and then compare the output and if there is match do print the details out. But I don't want to do freq as its time consuming. Is there a better way to achieve the results ?

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Depending on what you need something like this can give you count of the instances of specific values

proc sql;

     select variable, Count(*) as valuecount

     from dataset

     where variable in (value1, value2, value3)

     group by variable;

quit;

if the values are strings then enclose them in quotes but the results are case sensitive unless you use Upcase or lowcase such as upcase(variable) in (value list);

View solution in original post

6 REPLIES 6
ballardw
Super User

Depending on what you need something like this can give you count of the instances of specific values

proc sql;

     select variable, Count(*) as valuecount

     from dataset

     where variable in (value1, value2, value3)

     group by variable;

quit;

if the values are strings then enclose them in quotes but the results are case sensitive unless you use Upcase or lowcase such as upcase(variable) in (value list);

Reeza
Super User

Assuming that the values are in a tables

PROC SQL;

Create Table Match as

select Variable

from TableA

where Variable in (Select Variable from TableB)

Order by variable;

quit;

*Non matches;

PROC SQL;

Create Table Non_Match as

select Variable

from TableA

where Variable not in (select Variable from TableB)

order by variable;

quit;

Astounding
PROC Star

The real answer depends on what you really need.  For example, do you need to know whether CITY is ever "Boston" on some records, and "New York" on others?  Or do you need to know whether CITY is ever either "Boston" or "New York" on any record?  At any rate, here's a much faster tool to get a yes/no answer for a single CITY value:

%let city=N;

data _null_;

   set have (keep=city);

   where city="Boston";

   call symputx('city', 'Y');

   stop;

run;

It stops processing as soon as it finds a single "Boston" in the data, so you don't run up much of a bill.  Then you can use the macro variable &CITY later on in the program.  But it all depends on what you really need.

FriedEgg
SAS Employee

The same functionality denoted by can be accomplished with PROC SQL as well by using the INOBS=1 option

You will get a boolean response in the macro variable assigned

20         proc sql noprint inobs=1;

21         select count(*) into :acura from sashelp.cars where make='Acura';

WARNING: Only 1 records were read from SASHELP.CARS due to INOBS= option.

22         quit;

23        

24         %put &acura;

1

20         proc sql noprint inobs=1;

21         select count(*) into :acura from sashelp.cars where make='Foo';

WARNING: Only 1 records were read from SASHELP.CARS due to INOBS= option.

22         quit;

23        

24         %put &acura;

0

jakarman
Barite | Level 11

There are lot better ways to do that. Most of them are requiring you know something about your data and the effects on the system of your code.

When your data is in an external DBMS it will be different to a local SAS-dataset

Is your data pre-sorted or indexed on that field or must you process all data in a sequential way?  With this: will the processing occur often or just once.

Some procedures are multithreading others are not. Are you having benefits of the multithreading or will the IO pose a new bottleneck?  

---->-- ja karman --<-----
anoopm7
Calcite | Level 5

I did it as below

proc sql;

select count(*) into:nobs from test&i.;

quit;

%if &nobs> 0 %then %do ;

data princheck&i. ;

set "&&fln&i." ;

where TTN in ('9038')

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2761 views
  • 7 likes
  • 6 in conversation