Solved: Re: Matching variable across observations in data step

ebowen · Posted 08-27-2013 11:07 AM

I'm trying to figure out a way to compare the value of one variable with a different variable across different observations within a data step, and then output those observations where the data matches. I'm not quite sure if I'm going about this the right way, but I've been trying to do this with a hash table, where I essentially read the dataset in twice -- once in the hash table, and again through the data step. I'm then trying to find all observations in the hash table that match three different criteria -- where the PlantID and PrimeMover are the same, and where the primary fuel matches the secondary fuel. Here is my code so far:

DATA Test2;

if _N_=1 then do;

declare hash h(dataset:'Test', MULTIDATA:'YES');

h.definekey('PlantID','PrimeMover','Secondary');

h.definedata('Primary','Secondary');

h.Definedone();

end;

set Test;

if h.Find(key:PlantID,key:PrimeMover,key:Primary)=0 then output;

RUN;

And here's some sample data:

Year	PlantID	PrimeMover	Primary	Secondary
2012	3	ST	NG	NG
2012	3	ST	NG	FO2
2012	3	ST	FO2	NG
2012	3	ST	NG	BIT
2012	4	HY	WAT	WAT
2012	7	ST	BIT	BIT
2012	7	ST	BIT	NG
2012	7	ST	NG	BIT
2012	8	ST	BIT	WC
2012	9	GT	NG	DFO
2012	9	GT	NG	NG
2012	10	GT	NG	DFO
2012	10	GT	NG	NG
2012	10	ST	BIT	BIT
2012	10	ST	BIT	WC
2012	26	ST	DFO	DFO
2012	26	ST	BIT	DFO
2012	26	ST	BIT	BIT
2012	26	ST	BIT	WC

If you take plant 3, for example, I want to output the first three observations, because the secondary fuel in the third observation matches the primary fuel in the first observation, and the secondary fuel in the second observation matches the primary fuel in the first observation. I don't care about the fourth observation because its secondary fuel is not overlapping with any of the primary fuels of the others. I know I'm not handling the hash multidata correctly at this point, but I'm mostly concerned with the find statement where I'm trying to get the observations where the secondary key matches the primary field of the current observation in the data step. I'm new to hashtags, so this may not be the best approach. I had also thought about writing a macro that would loop through the dataset twice to get those observations that match each other, but I figured I would try the hashtag approach first. Thanks for your help!

Reeza · Posted 08-27-2013 12:21 PM

Is this close to what you're looking for? PS. Hashtags are different than Hash Tables, which is different from Hash

Basically join the table to itself and if the secondary matches the primary and the plant id is the same, but take only distinct records.

data have;

input Year PlantID PrimeMover $ Primary $ Secondary $;

cards;

2012 3 ST NG NG

2012 3 ST NG FO2

2012 3 ST FO2 NG

2012 3 ST NG BIT

2012 4 HY WAT WAT

2012 7 ST BIT BIT

2012 7 ST BIT NG

2012 7 ST NG BIT

2012 8 ST BIT WC

2012 9 GT NG DFO

2012 9 GT NG NG

2012 10 GT NG DFO

2012 10 GT NG NG

2012 10 ST BIT BIT

2012 10 ST BIT WC

2012 26 ST DFO DFO

2012 26 ST BIT DFO

2012 26 ST BIT BIT

2012 26 ST BIT WC

;;;;

proc sql;

create table want as

select distinct a.*

from have as a

join have as b

on a.Secondary=b.Primary

and a.PlantID=b.PlantID

and a.Primemover=b.primemover;

quit;

EDIT: add in condition for primemover to be the same as well as plant id.

View solution in original post

Reeza · Posted 08-27-2013 12:21 PM

Is this close to what you're looking for? PS. Hashtags are different than Hash Tables, which is different from Hash

Basically join the table to itself and if the secondary matches the primary and the plant id is the same, but take only distinct records.

data have;

input Year PlantID PrimeMover $ Primary $ Secondary $;

cards;

2012 3 ST NG NG

2012 3 ST NG FO2

2012 3 ST FO2 NG

2012 3 ST NG BIT

2012 4 HY WAT WAT

2012 7 ST BIT BIT

2012 7 ST BIT NG

2012 7 ST NG BIT

2012 8 ST BIT WC

2012 9 GT NG DFO

2012 9 GT NG NG

2012 10 GT NG DFO

2012 10 GT NG NG

2012 10 ST BIT BIT

2012 10 ST BIT WC

2012 26 ST DFO DFO

2012 26 ST BIT DFO

2012 26 ST BIT BIT

2012 26 ST BIT WC

;;;;

proc sql;

create table want as

select distinct a.*

from have as a

join have as b

on a.Secondary=b.Primary

and a.PlantID=b.PlantID

and a.Primemover=b.primemover;

quit;

EDIT: add in condition for primemover to be the same as well as plant id.

ebowen · Posted 08-28-2013 09:59 PM

Thanks for this solution. I ended up using a different SQL approach where I counted the duplicate secondary sources within each primary source. This had the added benefit of giving me the number of overlapping sources, which I will need for later analysis.

Matching variable across observations in data step

Re: Matching variable across observations in data step

Re: Matching variable across observations in data step

Re: Matching variable across observations in data step

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away