BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Krueger1
Fluorite | Level 6

I have to sort by NAME and then DATE. Once this is complete I'm trying to determine the UNIQUE frequency of each ID occurring between any ID's -4. 

 

Currently if ID = 1 then 2 then 1 it'll count as 3 when I'm needing that to count as 2. Below is examples.

 

data have;
input NAME $ Flag DATE ID RESULT;
format DATE yymmdd10.;
cards;
Joe 0 20190101 1 500
Joe 0 20190102 1 500
Joe 0 20190103 1 500
Joe 0 20190104 1  -4
Joe 1 20190121 1 500
Joe 0 20190122 1 500
Tom 0 20190104 1 500
Tom 0 20190101 1  -4
Tom 0 20190102 1 500
Tom 0 20190103 2 500
Tom 0 20190104 3 500
Tom 0 20190105 1 500
Tom 0 20190106 2 500
Tom 0 20190107 7  -4
;
run;

 

data want;
input NAME $ DATE ID RESULT want;
format DATE yymmdd10.;
cards;
Joe 0 20190101 1 500 1
Joe 0 20190102 1 500 1
Joe 0 20190103 1 500 1
Joe 0 20190104 1 -4 1
Joe 1 20190121 1 500 1
Joe 0 20190122 1 500 1
Tom 0 20190104 1 500 1
Tom 0 20190101 1 -4 1
Tom 0 20190102 1 500 1
Tom 0 20190103 2 500 2
Tom 0 20190104 3 500 3
Tom 0 20190105 1 500 3
Tom 0 20190106 2 500 3
Tom 0 20190107 7 -4  4
;
run;

*existing non-working code;
data want;
set have;
by name id notsorted;
retain have;
if first.name or lag(id) = -4 then card_Count = 1;
else if first.id then card_Count + 1; 
run;
1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @Krueger1,

 

You need to create some sort of lookup table to store the information about the distinct IDs you've already seen in the current group of observations.

 

I would use a hash object for this purpose:

data want;
if _n_=1 then do;
  dcl hash h();
  h.definekey('id');
  h.definedone();
end;
set have;
by name date;
if first.name or lag(result) = -4 then h.clear();
h.ref();
card_Count=h.num_items;
run;

(This assumes that dataset HAVE has been sorted by NAME DATE, as you mentioned.)

 

If you're not familiar with hash objects, you may want to use a temporary array instead. Depending on the ID values, you can use

  • an array indexed by ID values (assuming that ID values are integers between 1 and the array dimension, e.g., 999)
data want;
set have;
by name date;
array c[999] _temporary_;
if first.name or lag(result) = -4 then call missing(of c[*]);
c[id]=1;
card_Count=sum(of c[*]);
run;
  • or an array containing (up to, e.g., 999) distinct ID values:
data want;
set have;
by name date;
array c[999] _temporary_;
if first.name or lag(result) = -4 then do;
  card_Count=0;
  call missing(of c[*]);
end;
if id ~in c then do;
  card_Count+1;
  c[card_Count]=id;
end;
run;

 

 

View solution in original post

3 REPLIES 3
FreelanceReinh
Jade | Level 19

Hello @Krueger1,

 

You need to create some sort of lookup table to store the information about the distinct IDs you've already seen in the current group of observations.

 

I would use a hash object for this purpose:

data want;
if _n_=1 then do;
  dcl hash h();
  h.definekey('id');
  h.definedone();
end;
set have;
by name date;
if first.name or lag(result) = -4 then h.clear();
h.ref();
card_Count=h.num_items;
run;

(This assumes that dataset HAVE has been sorted by NAME DATE, as you mentioned.)

 

If you're not familiar with hash objects, you may want to use a temporary array instead. Depending on the ID values, you can use

  • an array indexed by ID values (assuming that ID values are integers between 1 and the array dimension, e.g., 999)
data want;
set have;
by name date;
array c[999] _temporary_;
if first.name or lag(result) = -4 then call missing(of c[*]);
c[id]=1;
card_Count=sum(of c[*]);
run;
  • or an array containing (up to, e.g., 999) distinct ID values:
data want;
set have;
by name date;
array c[999] _temporary_;
if first.name or lag(result) = -4 then do;
  card_Count=0;
  call missing(of c[*]);
end;
if id ~in c then do;
  card_Count+1;
  c[card_Count]=id;
end;
run;

 

 

Krueger1
Fluorite | Level 6

This is great and makes a lot of sense! My ID's in actual dataset are closer to GUID's then simple ID's but this is a great base and I'll definitely do some more research into HASH. Thank you!

FreelanceReinh
Jade | Level 19

@Krueger1 wrote:

My ID's in actual dataset are closer to GUID's then simple ID's ...


Unlike the first array solution, the second can be modified easily to work with character IDs. Just insert a suitable length specification into the array definition, e.g.,

array c[999] $16 _temporary_;

if the IDs have a length <=16.

 

The hash object solution works with character IDs as well because it uses the type and length of the ID variable in dataset HAVE.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 472 views
  • 1 like
  • 2 in conversation