I have to sort by NAME and then DATE. Once this is complete I'm trying to determine the UNIQUE frequency of each ID occurring between any ID's -4.
Currently if ID = 1 then 2 then 1 it'll count as 3 when I'm needing that to count as 2. Below is examples.
data have;
input NAME $ Flag DATE ID RESULT;
format DATE yymmdd10.;
cards;
Joe 0 20190101 1 500
Joe 0 20190102 1 500
Joe 0 20190103 1 500
Joe 0 20190104 1 -4
Joe 1 20190121 1 500
Joe 0 20190122 1 500
Tom 0 20190104 1 500
Tom 0 20190101 1 -4
Tom 0 20190102 1 500
Tom 0 20190103 2 500
Tom 0 20190104 3 500
Tom 0 20190105 1 500
Tom 0 20190106 2 500
Tom 0 20190107 7 -4
;
run;
data want;
input NAME $ DATE ID RESULT want;
format DATE yymmdd10.;
cards;
Joe 0 20190101 1 500 1
Joe 0 20190102 1 500 1
Joe 0 20190103 1 500 1
Joe 0 20190104 1 -4 1
Joe 1 20190121 1 500 1
Joe 0 20190122 1 500 1
Tom 0 20190104 1 500 1
Tom 0 20190101 1 -4 1
Tom 0 20190102 1 500 1
Tom 0 20190103 2 500 2
Tom 0 20190104 3 500 3
Tom 0 20190105 1 500 3
Tom 0 20190106 2 500 3
Tom 0 20190107 7 -4 4
;
run;
*existing non-working code;
data want;
set have;
by name id notsorted;
retain have;
if first.name or lag(id) = -4 then card_Count = 1;
else if first.id then card_Count + 1;
run;
Hello @Krueger1,
You need to create some sort of lookup table to store the information about the distinct IDs you've already seen in the current group of observations.
I would use a hash object for this purpose:
data want;
if _n_=1 then do;
dcl hash h();
h.definekey('id');
h.definedone();
end;
set have;
by name date;
if first.name or lag(result) = -4 then h.clear();
h.ref();
card_Count=h.num_items;
run;
(This assumes that dataset HAVE has been sorted by NAME DATE, as you mentioned.)
If you're not familiar with hash objects, you may want to use a temporary array instead. Depending on the ID values, you can use
data want;
set have;
by name date;
array c[999] _temporary_;
if first.name or lag(result) = -4 then call missing(of c[*]);
c[id]=1;
card_Count=sum(of c[*]);
run;
data want;
set have;
by name date;
array c[999] _temporary_;
if first.name or lag(result) = -4 then do;
card_Count=0;
call missing(of c[*]);
end;
if id ~in c then do;
card_Count+1;
c[card_Count]=id;
end;
run;
Hello @Krueger1,
You need to create some sort of lookup table to store the information about the distinct IDs you've already seen in the current group of observations.
I would use a hash object for this purpose:
data want;
if _n_=1 then do;
dcl hash h();
h.definekey('id');
h.definedone();
end;
set have;
by name date;
if first.name or lag(result) = -4 then h.clear();
h.ref();
card_Count=h.num_items;
run;
(This assumes that dataset HAVE has been sorted by NAME DATE, as you mentioned.)
If you're not familiar with hash objects, you may want to use a temporary array instead. Depending on the ID values, you can use
data want;
set have;
by name date;
array c[999] _temporary_;
if first.name or lag(result) = -4 then call missing(of c[*]);
c[id]=1;
card_Count=sum(of c[*]);
run;
data want;
set have;
by name date;
array c[999] _temporary_;
if first.name or lag(result) = -4 then do;
card_Count=0;
call missing(of c[*]);
end;
if id ~in c then do;
card_Count+1;
c[card_Count]=id;
end;
run;
This is great and makes a lot of sense! My ID's in actual dataset are closer to GUID's then simple ID's but this is a great base and I'll definitely do some more research into HASH. Thank you!
@Krueger1 wrote:
My ID's in actual dataset are closer to GUID's then simple ID's ...
Unlike the first array solution, the second can be modified easily to work with character IDs. Just insert a suitable length specification into the array definition, e.g.,
array c[999] $16 _temporary_;
if the IDs have a length <=16.
The hash object solution works with character IDs as well because it uses the type and length of the ID variable in dataset HAVE.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.