Re: Looping over rows to perform search

jlin4 · Posted 07-23-2021 02:26 PM

Supposed I have a dataset work.id_score, and I intend to search it for a specific ID. If that ID is found in the dataset, then it will return me "Y". Otherwise, it will return a "N". I have written the codes below to do so if I only have one ID to search for.

However, should I have a list of ID, such as in work.id, then how should I go about searching for them at once? (Supposed that I have to make use of the codes written for the case of 1 ID of interest since this is a simplified example) Then desired result is work.id_results, where the results (Y or N) is returned as a new column next to the ID column. Thank you!

data work.id_score;
input ID$ score;
cards;
abc 10
def 20
ghi 30
abc 15
def 25
;
run;

%let id_search = "abc";

proc sql; create table work.id_count as
	select *
	from work.id_score
	where ID = &id_search;
quit;

proc sql; select case when count(*) > 0 then "Y" else "N" end as search_outcome
	from work.id_count;
quit;

data work.id;
input ID$;
cards;
abc
def
xyz
pqr
;
run;

data work.id_results;
input ID$ search_outcome$;
cards;
abc Y
def Y
xyz N
pqr N
;
run;

Rydhm · Posted 07-23-2021 03:17 PM

With your method, you will need to create a macro variable with the list of Ids that you want to search. Then loop through the list and apply proc sql for each element. This will be doable but lengthy. Instead, you can use merge as follows.

data work.id_score;
input ID$ score;
cards;
abc 10
def 20
ghi 30
abc 15
def 25
;
run;

data work.id;
input ID$;
cards;
abc
def
xyz
pqr
;
run;

proc sort data=id_score; by id; run;
proc sort data=id; by id; run;

data search;
merge id_score(in=a) id(in=b);
   by id;
   if a and b then search_outcome='Y';
   else search_outcome='N';
   if b;
   keep id search_outcome;
run;

proc sort data=search nodupkey; by id; run;

proc print data=search; run;

jimbarbour · Posted 07-23-2021 03:19 PM

There are multiple ways you could do it. Have you considered using a hash table? If you load your "look up" dataset, which would be work.id_score, into a hash table, you could do a read on the hash table. If you get a zero return code, you would set the result to 'Y'. If you get a non zero return code, that would indicate that there is no match, and you would set the result to 'N'. See example code, below, and, below that, the results.

Jim

data work.id_score;
input ID$ score;
cards;
abc 10
def 20
ghi 30
abc 15
def 25
;
run;

data work.id;
input ID$;
cards;
abc
def
xyz
pqr
;
run;

DATA	WORK.id_results;
	DROP	_:;
	SET	WORK.ID;

	**------------------------------**;
	**	Define hash tables.			**;
	**------------------------------**;
	IF	_N_							=	1	THEN
		DO;
			DECLARE	HASH	Hsh_ID_Table			(ORDERED: 'A', MULTIDATA: 'N', DATASET: 'work.id_score');
							Hsh_ID_Table.DefineKey 	('ID');
							Hsh_ID_Table.DefineData	('ID');
							Hsh_ID_Table.DefineDone	();
		END;

	**------------------------------**;
	**	Look up ID.					**;
	**------------------------------**;
	_RC								=	Hsh_ID_Table.FIND();
	IF	_RC							=	0	THEN
		DO;
			Search_Outcome			=	'Y';
		END;
	ELSE
		DO;
			Search_Outcome			=	'N';
		END;
RUN;

ballardw · Posted 07-23-2021 03:20 PM

@jlin4 wrote:

Supposed I have a dataset work.id_score, and I intend to search it for a specific ID. If that ID is found in the dataset, then it will return me "Y". Otherwise, it will return a "N". I have written the codes below to do so if I only have one ID to search for.

However, should I have a list of ID, such as in work.id, then how should I go about searching for them at once? (Supposed that I have to make use of the codes written for the case of 1 ID of interest since this is a simplified example) Then desired result is work.id_results, where the results (Y or N) is returned as a new column next to the ID column. Thank you!
data work.id_score;
input ID$ score;
cards;
abc 10
def 20
ghi 30
abc 15
def 25
;
run;
%let id_search = "abc";

proc sql; create table work.id_count as
	select *
	from work.id_score
	where ID = &id_search;
quit;

proc sql; select case when count(*) > 0 then "Y" else "N" end as search_outcome
	from work.id_count;
quit;
data work.id;
input ID$;
cards;
abc
def
xyz
pqr
;
run;

data work.id_results;
input ID$ search_outcome$;
cards;
abc Y
def Y
xyz N
pqr N
;
run;    

Here is a way that creates a numeric 1/0 outcome variable. Since SAS will treat 1 as true (or yes) and 0 as false (or no) for logic the 1/0 coding is usually a better way to go then character values like 'Y' and 'N'.

proc sql;
   create table want as 
   select a.id, (a.id=b.id) as outcome
   from work.id as a left join
        (select distinct id from work.id_score) as b
        on a.id=b.id
   ;
quit;

You can get the count of "yes" with a sum function, the percent yes as a mean function result, any "yes" if the max is 1, all "yes" if the min is 1, all the same if the range is 0. Which could require some interesting coding to get with character values.

If you want to display Y/N Yes/No or similar than a custom format will do that just fine.

PGStats · Posted 07-23-2021 04:13 PM

The simpler the better 😉

proc sql;
create table want as 
select 
	id, 
	id in (select id from id) as outcome
from id_score;

PG

Looping over rows to perform search