I have to sort by NAME and then DATE. Once this is complete I'm trying to determine the UNIQUE frequency of each ID occurring between any ID's -4.
Currently if ID = 1 then 2 then 1 it'll count as 3 when I'm needing that to count as 2. Below is examples.
data have;
input NAME $ Flag DATE ID RESULT;
format DATE yymmdd10.;
cards;
Joe 0 20190101 1 500
Joe 0 20190102 1 500
Joe 0 20190103 1 500
Joe 0 20190104 1 -4
Joe 1 20190121 1 500
Joe 0 20190122 1 500
Tom 0 20190104 1 500
Tom 0 20190101 1 -4
Tom 0 20190102 1 500
Tom 0 20190103 2 500
Tom 0 20190104 3 500
Tom 0 20190105 1 500
Tom 0 20190106 2 500
Tom 0 20190107 7 -4
;
run;
data want;
input NAME $ DATE ID RESULT want;
format DATE yymmdd10.;
cards;
Joe 0 20190101 1 500 1
Joe 0 20190102 1 500 1
Joe 0 20190103 1 500 1
Joe 0 20190104 1 -4 1
Joe 1 20190121 1 500 1
Joe 0 20190122 1 500 1
Tom 0 20190104 1 500 1
Tom 0 20190101 1 -4 1
Tom 0 20190102 1 500 1
Tom 0 20190103 2 500 2
Tom 0 20190104 3 500 3
Tom 0 20190105 1 500 3
Tom 0 20190106 2 500 3
Tom 0 20190107 7 -4 4
;
run;
*existing non-working code;
data want;
set have;
by name id notsorted;
retain have;
if first.name or lag(id) = -4 then card_Count = 1;
else if first.id then card_Count + 1;
run;
Hello @Krueger1,
You need to create some sort of lookup table to store the information about the distinct IDs you've already seen in the current group of observations.
I would use a hash object for this purpose:
data want;
if _n_=1 then do;
dcl hash h();
h.definekey('id');
h.definedone();
end;
set have;
by name date;
if first.name or lag(result) = -4 then h.clear();
h.ref();
card_Count=h.num_items;
run;
(This assumes that dataset HAVE has been sorted by NAME DATE, as you mentioned.)
If you're not familiar with hash objects, you may want to use a temporary array instead. Depending on the ID values, you can use
data want;
set have;
by name date;
array c[999] _temporary_;
if first.name or lag(result) = -4 then call missing(of c[*]);
c[id]=1;
card_Count=sum(of c[*]);
run;
data want;
set have;
by name date;
array c[999] _temporary_;
if first.name or lag(result) = -4 then do;
card_Count=0;
call missing(of c[*]);
end;
if id ~in c then do;
card_Count+1;
c[card_Count]=id;
end;
run;
Hello @Krueger1,
You need to create some sort of lookup table to store the information about the distinct IDs you've already seen in the current group of observations.
I would use a hash object for this purpose:
data want;
if _n_=1 then do;
dcl hash h();
h.definekey('id');
h.definedone();
end;
set have;
by name date;
if first.name or lag(result) = -4 then h.clear();
h.ref();
card_Count=h.num_items;
run;
(This assumes that dataset HAVE has been sorted by NAME DATE, as you mentioned.)
If you're not familiar with hash objects, you may want to use a temporary array instead. Depending on the ID values, you can use
data want;
set have;
by name date;
array c[999] _temporary_;
if first.name or lag(result) = -4 then call missing(of c[*]);
c[id]=1;
card_Count=sum(of c[*]);
run;
data want;
set have;
by name date;
array c[999] _temporary_;
if first.name or lag(result) = -4 then do;
card_Count=0;
call missing(of c[*]);
end;
if id ~in c then do;
card_Count+1;
c[card_Count]=id;
end;
run;
This is great and makes a lot of sense! My ID's in actual dataset are closer to GUID's then simple ID's but this is a great base and I'll definitely do some more research into HASH. Thank you!
@Krueger1 wrote:
My ID's in actual dataset are closer to GUID's then simple ID's ...
Unlike the first array solution, the second can be modified easily to work with character IDs. Just insert a suitable length specification into the array definition, e.g.,
array c[999] $16 _temporary_;
if the IDs have a length <=16.
The hash object solution works with character IDs as well because it uses the type and length of the ID variable in dataset HAVE.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.