DATA Step, Macro, Functions and more

Data Selection

Accepted Solution Solved
Reply
Regular Contributor
Posts: 229
Accepted Solution

Data Selection

Hi....I have a dataset where I would like to select or keep only those records(rows) where either the First_Choice or Second_Choice is greater than zero (0) and only keep the records(rows) where the Third_Choice=1 if that record(row) is unique (appears only once) grouped by Store_ID and Customer_ID. Thanks.

 

 

Have:

 

Store_ID Customer _Number Product_Number First_Choice Second_Choice Third_Choice
1231 23345 102011 3 0 0
1231 23345 102333 0 0 1
1231 23345 102441 0 1 0
1231 24898 101457 0 0 1
1231 24898 149845 0 0 1
1354 28779 112786 1 2 0
1354 28779 122654 0 2 0
1354 28779 125234 0 0 1
1354 28779 145889 1 0 0

 

 

Want:

 

Store_ID Customer _Number Product_Number First_Choice Second_Choice Third_Choice
1231 23345 102011 3 0 0
1231 23345 102441 0 1 0
1231 24898 101457 0 0 1
1354 28779 112786 1 2 0
1354 28779 122654 0 2 0
1354 28779 145889 1 0 0

Accepted Solutions
Solution
‎06-24-2016 07:22 PM
Super User
Posts: 10,018

Re: Data Selection

 

"if that record(row) is unique (appears only once) grouped by Store_ID and Customer_ID."

You mean Store_ID and Customer_ID appears only once, right?

 

data have;
infile cards expandtabs truncover;
input Store_ID	Customer_Number	Product_Number	First_Choice	Second_Choice	Third_Choice;
cards;
1231	23345	102011	3	0	0
1231	23345	102333	0	0	1
1231	23345	102441	0	1	0
1231	24898	101457	0	0	1
1231	24898	149845	0	0	1
1354	28779	112786	1	2	0
1354	28779	122654	0	2	0
1354	28779	125234	0	0	1
1354	28779	145889	1	0	0
;
run;
data key;
set have;
if First_Choice>0 or Second_Choice>0;
run;

data want;
 if _n_=1 then do;
  if 0 then set key;
  declare hash h(dataset:'key');
  h.definekey('Store_ID','Customer_Number');
  h.definedone();
 end;
set have;

if First_Choice>0 or Second_Choice>0 or 
  (Third_Choice=1 and h.check() ne 0) then do;
  output;h.replace();
end;
run;

View solution in original post


All Replies
Solution
‎06-24-2016 07:22 PM
Super User
Posts: 10,018

Re: Data Selection

 

"if that record(row) is unique (appears only once) grouped by Store_ID and Customer_ID."

You mean Store_ID and Customer_ID appears only once, right?

 

data have;
infile cards expandtabs truncover;
input Store_ID	Customer_Number	Product_Number	First_Choice	Second_Choice	Third_Choice;
cards;
1231	23345	102011	3	0	0
1231	23345	102333	0	0	1
1231	23345	102441	0	1	0
1231	24898	101457	0	0	1
1231	24898	149845	0	0	1
1354	28779	112786	1	2	0
1354	28779	122654	0	2	0
1354	28779	125234	0	0	1
1354	28779	145889	1	0	0
;
run;
data key;
set have;
if First_Choice>0 or Second_Choice>0;
run;

data want;
 if _n_=1 then do;
  if 0 then set key;
  declare hash h(dataset:'key');
  h.definekey('Store_ID','Customer_Number');
  h.definedone();
 end;
set have;

if First_Choice>0 or Second_Choice>0 or 
  (Third_Choice=1 and h.check() ne 0) then do;
  output;h.replace();
end;
run;
Regular Contributor
Posts: 229

Re: Data Selection

Hi Xia....thanks you for your quick response.

 

"if that record(row) is unique (appears only once) grouped by Store_ID and Customer_ID."

You mean Store_ID and Customer_ID appears only once, right?

 

What I meant was that a record(row) could be the unique grouped by Store_ID and Customer_ID with Third_Choice = 1, then that record(row) is to be kept. If not and all the entries for Third_Choice is equal to 1 for the same Store_ID and Customer_ID, then only 1 (anyone of how many records) of these records(rows) is kept. I hope that clarifies things a little better. Thanks.

Trusted Advisor
Posts: 1,137

Re: Data Selection

[ Edited ]

please try the DOW loop approach

 

data have;
	infile cards missover;
	input Store_ID Customer_Number Product_Number First_Choice Second_Choice 
		Third_Choice;
	cards;
1231     23345     102011     3     0     0
1231     23345     102333     0     0     1
1231     23345     102441     0     1     0
1231     24898     101457     0     0     1
1231     24898     149845     0     0     1
1354     28779     112786     1     2     0
1354     28779     122654     0     2     0
1354     28779     125234     0     0     1
1354     28779     145889     1     0     0
;

proc sort data=have;
	by Store_ID Customer_Number Third_Choice;
	where First_Choice>0 or Second_Choice >0 or Third_Choice=1;
run;

data want;
	i=0;

	do until(last.Third_Choice);
		set have;
		by Store_ID Customer_Number Third_Choice;
		i+1;
	end;

	do until(last.Third_Choice);
		set have;
		by Store_ID Customer_Number Third_Choice;

		if (First_Choice>0 or Second_Choice >0) or (Third_Choice=1 and i=1);
		output;
	end;
run;

 

Thanks,
Jag
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 225 views
  • 0 likes
  • 3 in conversation