BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sebster24
Quartz | Level 8

Hello,

I hope you guys are well.

DATA: The data is unsorted and hence I am using hash tables. Sorting this table would be a time-consuming effort. If there is no other option, then I will need to sit down for the gruesome sorting approach.

What I want: I am trying to enumerate a table variable "answer" with binary values (0/1) if variable filter = "Y" for the next 6 observations with the same client. In short if variable filter "Y" then this observation and the next 5 observations for same client should be assigned variable "answer" eq 1, else 0.

 

I have presented the "have" and "want" data below:


data have;
input client $ dates date9. filter $;
datalines ;
Fg5151 28.Feb.06 N
Fg5151 31.Mar.06 N
Fg5151 30.Apr.06 N
Fg5151 31.May.06 Y
Fg5151 30.Jun.06 N
Fg5151 31.Jul.06 Y
Fg5151 31.Aug.06 N
Fg5151 30.Sep.06 N
Fg5151 31.Oct.06 N
Fg5151 30.Nov.06 N
Fg5151 31.Dec.06 N
Fg5151 01.Jan.07 N
A101 28.Feb.06 N
A101 31.Mar.06 N
A101 30.Apr.06 Y
A101 31.May.06 N
A101 30.Jun.06 N
A101 31.Jul.06 N
ABC123 31.Mar.06 N
;


data want;
input client $ dates date9. filter $ answer 8;
datalines ;
A101 28.Feb.06 N 0
A101 31.Mar.06 N 0
A101 30.Apr.06 Y 1
A101 31.May.06 N 1
A101 30.Jun.06 N 1
A101 31.Jul.06 N 1
ABC123 31.Mar.06 N 0
Fg5151 28.Feb.06 N 0
Fg5151 31.Mar.06 N 0
Fg5151 30.Apr.06 N 0
Fg5151 31.May.06 Y 1
Fg5151 30.Jun.06 N 1
Fg5151 31.Jul.06 Y 1
Fg5151 31.Aug.06 N 1
Fg5151 30.Sep.06 N 1
Fg5151 31.Oct.06 N 1
Fg5151 30.Nov.06 N 1
Fg5151 31.Dec.06 N 1
Fg5151 01.Jan.07 N 0
;

 

 

I have compiled the HASH code below - The code works, but variable "answer" provides unsatisfactory results:

data _null_;
	set have end=last;
	retain counter;
	if _n_ = 1 then do;
		length newdate 8 answer 8;
		format newdate ddmmyy10.;
		declare hash hs(ordered: "Y", hashexp: 9);
		hs.defineKey("client","dates");
		hs.defineData("client","dates","filter","answer","newdate");
		hs.defineDone();
	end;
	rc = hs.find();
	if rc ne 0 then do;
		if filter eq "Y" then do;
			counter+0;
			by CLIENT filter notsorted;
			if first.filter then do;
				answer=1;
				newdate=intnx('month',dates,6,"same");
			end;
		end;
		hs.add();
	end;
	else do;
		counter=0;
	end;
	if last eq 1 then do;
	hs.output(dataset: "want");
	end;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Your want dataset does not match the request - you state you do not want to sort the data, but want is sorted?

You could do it using retain:

data want;
  set have;
  retain answer c;
  if _n_=1 or lag(client) ne client then do;
    answer=0;
    c=0;
  end;
  if filter="Y" then do;
    answer=1;
    c=1;
  end;
  else if answer=1 then c=c+1;
  if c > 6 then do;
    answer=0;
    c=0;
  end;
run;

However I would really advise sorting the data as unlike databases, SAS works better on sorted data (i.e. by group processing).

View solution in original post

4 REPLIES 4
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Your want dataset does not match the request - you state you do not want to sort the data, but want is sorted?

You could do it using retain:

data want;
  set have;
  retain answer c;
  if _n_=1 or lag(client) ne client then do;
    answer=0;
    c=0;
  end;
  if filter="Y" then do;
    answer=1;
    c=1;
  end;
  else if answer=1 then c=c+1;
  if c > 6 then do;
    answer=0;
    c=0;
  end;
run;

However I would really advise sorting the data as unlike databases, SAS works better on sorted data (i.e. by group processing).

sebster24
Quartz | Level 8

Hello RW9,

Firstly, thank you for your prompt response.

sorry about the confusion.

I meant to say table "have" is not sorted. The output table "want" will be sorted.

I am unable to run the sort operation on the original dataset as the dataset is really huge. A sort operation would take around 3 hours for each dataset - Hence sorting may possibly not be the best solution.

I have amended the code that i wrote and included your code snippet into my code. Then it works perfectly fine and produces the data.

However, if i move the data around  (i.e. unsort the data i have) then i do not get the want table. Code below:


data have;
input client $ dates date9. filter $;
datalines ;
Fg5151 31.Jul.06 Y
Fg5151 31.Aug.06 N
Fg5151 28.Feb.06 N
Fg5151 31.Mar.06 N
Fg5151 30.Apr.06 N
Fg5151 31.May.06 Y
Fg5151 30.Jun.06 N
Fg5151 30.Sep.06 N
Fg5151 31.Oct.06 N
Fg5151 30.Nov.06 N
Fg5151 31.Dec.06 N
Fg5151 01.Jan.07 N
A101 30.Apr.06 Y
A101 28.Feb.06 N
A101 31.Mar.06 N
A101 31.May.06 N
A101 30.Jun.06 N
A101 31.Jul.06 N
ABC123 31.Mar.06 N
;


data have;
format dates ddmmyy10.;
set have;
dates=put(dates,best9.);
run;

data _null_;
	set have end=last;
	if _n_ = 1 then do;
		length newdate 8 answer 8 c 8;
		format newdate ddmmyy10.;
		declare hash hs(ordered: "Y", hashexp: 9);
		hs.defineKey("client","dates");
		hs.defineData("client","dates","filter","answer","c");
		hs.defineDone();
	end;
	rc = hs.find();
	if rc ne 0 then do;
		retain answer c;
		if _n_=1 or lag(client) ne client then do;
			answer=0;
			c=0;
		end;
		if filter="Y" then do;
			answer=1;
			c=1;
		end;
		else if answer=1 then c=c+1;
		if c > 6 then do;
			answer=0;
			c=0;
/*			hs.replace();*/
		end;
		hs.replace();
	end;
	if last eq 1 then do;
hs.output(dataset:
		"not_working");
	end;
run;

Table "not_working" is the output of the operation. 

 

Table "want" is what i am after:


data want;
input client $ dates date9. filter $ answer $;
datalines ;
A101 28.Feb.06 N 0
A101 31.Mar.06 N 0
A101 30.Apr.06 Y 1
A101 31.May.06 N 1
A101 30.Jun.06 N 1
A101 31.Jul.06 N 1
ABC123 31.Mar.06 N 0
Fg5151 28.Feb.06 N 0
Fg5151 31.Mar.06 N 0
Fg5151 30.Apr.06 N 0
Fg5151 31.May.06 Y 1
Fg5151 30.Jun.06 N 1
Fg5151 31.Jul.06 Y 1
Fg5151 31.Aug.06 N 1
Fg5151 30.Sep.06 N 1
Fg5151 31.Oct.06 N 1
Fg5151 30.Nov.06 N 1
Fg5151 31.Dec.06 N 1
Fg5151 01.Jan.07 N 0
;

Many thanks.

 

regards,

Sebastian

Ksharp
Super User

data have;
input client $ dates : date9. filter $;
n+1;
format dates date9.;
datalines ;
Fg5151 28.Feb.06 N
Fg5151 31.Mar.06 N
Fg5151 30.Apr.06 N
Fg5151 31.May.06 Y
Fg5151 30.Jun.06 N
Fg5151 31.Jul.06 Y
Fg5151 31.Aug.06 N
Fg5151 30.Sep.06 N
Fg5151 31.Oct.06 N
Fg5151 30.Nov.06 N
Fg5151 31.Dec.06 N
Fg5151 01.Jan.07 N
A101 28.Feb.06 N
A101 31.Mar.06 N
A101 30.Apr.06 Y
A101 31.May.06 N
A101 30.Jun.06 N
A101 31.Jul.06 N
ABC123 31.Mar.06 N
;
run;
data key;
 set have(keep=client filter n where=(filter='Y'));
 output;
 do i=1 to 5;
  n=n+1;output;
 end;
 drop i;
run;
data want;
 if _n_=1 then do;
  declare hash h(dataset:'key',hashexp:20);
  h.definekey('client','n');
  h.definedone();
 end;
set have;
if h.check()=0 then answer=1;
 else answer=0;
run;

sebster24
Quartz | Level 8
Hello Xia Keshan (KSharp),
Instead of picking up next 6 observations, what if I want observations in the next 6 months?
I created a post: https://communities.sas.com/t5/Base-SAS-Programming/sas-enumeration-variable-for-next-6-months/m-p/2...
thank you so much.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 2083 views
  • 1 like
  • 3 in conversation