Fluorite | Level 6

## Increment on out of order but distinct values?

I have to sort by NAME and then DATE. Once this is complete I'm trying to determine the UNIQUE frequency of each ID occurring between any ID's -4.

Currently if ID = 1 then 2 then 1 it'll count as 3 when I'm needing that to count as 2. Below is examples.

``````data have;
input NAME \$ Flag DATE ID RESULT;
format DATE yymmdd10.;
cards;
Joe 0 20190101 1 500
Joe 0 20190102 1 500
Joe 0 20190103 1 500
Joe 0 20190104 1  -4
Joe 1 20190121 1 500
Joe 0 20190122 1 500
Tom 0 20190104 1 500
Tom 0 20190101 1  -4
Tom 0 20190102 1 500
Tom 0 20190103 2 500
Tom 0 20190104 3 500
Tom 0 20190105 1 500
Tom 0 20190106 2 500
Tom 0 20190107 7  -4
;
run;``````

``````data want;
input NAME \$ DATE ID RESULT want;
format DATE yymmdd10.;
cards;
Joe 0 20190101 1 500 1
Joe 0 20190102 1 500 1
Joe 0 20190103 1 500 1
Joe 0 20190104 1 -4 1
Joe 1 20190121 1 500 1
Joe 0 20190122 1 500 1
Tom 0 20190104 1 500 1
Tom 0 20190101 1 -4 1
Tom 0 20190102 1 500 1
Tom 0 20190103 2 500 2
Tom 0 20190104 3 500 3
Tom 0 20190105 1 500 3
Tom 0 20190106 2 500 3
Tom 0 20190107 7 -4  4
;
run;

*existing non-working code;
data want;
set have;
by name id notsorted;
retain have;
if first.name or lag(id) = -4 then card_Count = 1;
else if first.id then card_Count + 1;
run;``````
1 ACCEPTED SOLUTION

Accepted Solutions

## Re: Increment on out of order but distinct values?

Hello @Krueger1,

You need to create some sort of lookup table to store the information about the distinct IDs you've already seen in the current group of observations.

I would use a hash object for this purpose:

``````data want;
if _n_=1 then do;
dcl hash h();
h.definekey('id');
h.definedone();
end;
set have;
by name date;
if first.name or lag(result) = -4 then h.clear();
h.ref();
card_Count=h.num_items;
run;``````

(This assumes that dataset HAVE has been sorted by NAME DATE, as you mentioned.)

If you're not familiar with hash objects, you may want to use a temporary array instead. Depending on the ID values, you can use

• an array indexed by ID values (assuming that ID values are integers between 1 and the array dimension, e.g., 999)
``````data want;
set have;
by name date;
array c[999] _temporary_;
if first.name or lag(result) = -4 then call missing(of c[*]);
c[id]=1;
card_Count=sum(of c[*]);
run;``````
• or an array containing (up to, e.g., 999) distinct ID values:
``````data want;
set have;
by name date;
array c[999] _temporary_;
if first.name or lag(result) = -4 then do;
card_Count=0;
call missing(of c[*]);
end;
if id ~in c then do;
card_Count+1;
c[card_Count]=id;
end;
run;``````

3 REPLIES 3

## Re: Increment on out of order but distinct values?

Hello @Krueger1,

You need to create some sort of lookup table to store the information about the distinct IDs you've already seen in the current group of observations.

I would use a hash object for this purpose:

``````data want;
if _n_=1 then do;
dcl hash h();
h.definekey('id');
h.definedone();
end;
set have;
by name date;
if first.name or lag(result) = -4 then h.clear();
h.ref();
card_Count=h.num_items;
run;``````

(This assumes that dataset HAVE has been sorted by NAME DATE, as you mentioned.)

If you're not familiar with hash objects, you may want to use a temporary array instead. Depending on the ID values, you can use

• an array indexed by ID values (assuming that ID values are integers between 1 and the array dimension, e.g., 999)
``````data want;
set have;
by name date;
array c[999] _temporary_;
if first.name or lag(result) = -4 then call missing(of c[*]);
c[id]=1;
card_Count=sum(of c[*]);
run;``````
• or an array containing (up to, e.g., 999) distinct ID values:
``````data want;
set have;
by name date;
array c[999] _temporary_;
if first.name or lag(result) = -4 then do;
card_Count=0;
call missing(of c[*]);
end;
if id ~in c then do;
card_Count+1;
c[card_Count]=id;
end;
run;``````

Fluorite | Level 6

## Re: Increment on out of order but distinct values?

This is great and makes a lot of sense! My ID's in actual dataset are closer to GUID's then simple ID's but this is a great base and I'll definitely do some more research into HASH. Thank you!

## Re: Increment on out of order but distinct values?

@Krueger1 wrote:

My ID's in actual dataset are closer to GUID's then simple ID's ...

Unlike the first array solution, the second can be modified easily to work with character IDs. Just insert a suitable length specification into the array definition, e.g.,

``array c[999] \$16 _temporary_;``

if the IDs have a length <=16.

The hash object solution works with character IDs as well because it uses the type and length of the ID variable in dataset HAVE.

Discussion stats
• 3 replies
• 469 views
• 1 like
• 2 in conversation