Solved: Data Selection

twildone · Posted 06-23-2016 08:01 PM

Hi....I have a dataset where I would like to select or keep only those records(rows) where either the First_Choice or Second_Choice is greater than zero (0) and only keep the records(rows) where the Third_Choice=1 if that record(row) is unique (appears only once) grouped by Store_ID and Customer_ID. Thanks.

Have:

Store_ID	Customer _Number	Product_Number	First_Choice	Second_Choice	Third_Choice
1231	23345	102011	3	0	0
1231	23345	102333	0	0	1
1231	23345	102441	0	1	0
1231	24898	101457	0	0	1
1231	24898	149845	0	0	1
1354	28779	112786	1	2	0
1354	28779	122654	0	2	0
1354	28779	125234	0	0	1
1354	28779	145889	1	0	0

Want:

Store_ID	Customer _Number	Product_Number	First_Choice	Second_Choice	Third_Choice
1231	23345	102011	3	0	0
1231	23345	102441	0	1	0
1231	24898	101457	0	0	1
1354	28779	112786	1	2	0
1354	28779	122654	0	2	0
1354	28779	145889	1	0	0

Ksharp · Posted 06-23-2016 09:25 PM

"if that record(row) is unique (appears only once) grouped by Store_ID and Customer_ID."

You mean Store_ID and Customer_ID appears only once, right?

data have;
infile cards expandtabs truncover;
input Store_ID	Customer_Number	Product_Number	First_Choice	Second_Choice	Third_Choice;
cards;
1231	23345	102011	3	0	0
1231	23345	102333	0	0	1
1231	23345	102441	0	1	0
1231	24898	101457	0	0	1
1231	24898	149845	0	0	1
1354	28779	112786	1	2	0
1354	28779	122654	0	2	0
1354	28779	125234	0	0	1
1354	28779	145889	1	0	0
;
run;
data key;
set have;
if First_Choice>0 or Second_Choice>0;
run;

data want;
 if _n_=1 then do;
  if 0 then set key;
  declare hash h(dataset:'key');
  h.definekey('Store_ID','Customer_Number');
  h.definedone();
 end;
set have;

if First_Choice>0 or Second_Choice>0 or 
  (Third_Choice=1 and h.check() ne 0) then do;
  output;h.replace();
end;
run;

View solution in original post

Ksharp · Posted 06-23-2016 09:25 PM

"if that record(row) is unique (appears only once) grouped by Store_ID and Customer_ID."

You mean Store_ID and Customer_ID appears only once, right?

data have;
infile cards expandtabs truncover;
input Store_ID	Customer_Number	Product_Number	First_Choice	Second_Choice	Third_Choice;
cards;
1231	23345	102011	3	0	0
1231	23345	102333	0	0	1
1231	23345	102441	0	1	0
1231	24898	101457	0	0	1
1231	24898	149845	0	0	1
1354	28779	112786	1	2	0
1354	28779	122654	0	2	0
1354	28779	125234	0	0	1
1354	28779	145889	1	0	0
;
run;
data key;
set have;
if First_Choice>0 or Second_Choice>0;
run;

data want;
 if _n_=1 then do;
  if 0 then set key;
  declare hash h(dataset:'key');
  h.definekey('Store_ID','Customer_Number');
  h.definedone();
 end;
set have;

if First_Choice>0 or Second_Choice>0 or 
  (Third_Choice=1 and h.check() ne 0) then do;
  output;h.replace();
end;
run;

twildone · Posted 06-23-2016 09:39 PM

Hi Xia....thanks you for your quick response.

"if that record(row) is unique (appears only once) grouped by Store_ID and Customer_ID."

You mean Store_ID and Customer_ID appears only once, right?

What I meant was that a record(row) could be the unique grouped by Store_ID and Customer_ID with Third_Choice = 1, then that record(row) is to be kept. If not and all the entries for Third_Choice is equal to 1 for the same Store_ID and Customer_ID, then only 1 (anyone of how many records) of these records(rows) is kept. I hope that clarifies things a little better. Thanks.

Jagadishkatam · Posted 06-23-2016 09:43 PM

please try the DOW loop approach

data have;
	infile cards missover;
	input Store_ID Customer_Number Product_Number First_Choice Second_Choice 
		Third_Choice;
	cards;
1231     23345     102011     3     0     0
1231     23345     102333     0     0     1
1231     23345     102441     0     1     0
1231     24898     101457     0     0     1
1231     24898     149845     0     0     1
1354     28779     112786     1     2     0
1354     28779     122654     0     2     0
1354     28779     125234     0     0     1
1354     28779     145889     1     0     0
;

proc sort data=have;
	by Store_ID Customer_Number Third_Choice;
	where First_Choice>0 or Second_Choice >0 or Third_Choice=1;
run;

data want;
	i=0;

	do until(last.Third_Choice);
		set have;
		by Store_ID Customer_Number Third_Choice;
		i+1;
	end;

	do until(last.Third_Choice);
		set have;
		by Store_ID Customer_Number Third_Choice;

		if (First_Choice>0 or Second_Choice >0) or (Third_Choice=1 and i=1);
		output;
	end;
run;

Thanks,
Jag

Data Selection

Re: Data Selection

Re: Data Selection

Re: Data Selection

Re: Data Selection

Data Selection

Re: Data Selection

Re: Data Selection

Re: Data Selection

Re: Data Selection

Click image to register for webinar

Classroom Training Available!