Solved: Datastep / SQL in one step(more efficient code)

sagulolo · Posted 11-23-2017 08:52 AM

Hi all,

I have a dataset and SQL below, this is the best i can do, I know it can be done in one step of Datastep or SQL. Need advice how to do it in one step(Datastep or SQL) instead of mine by 2 step.

data have;
infile cards dlm='|';
input ACC_NO SUB_ACCNO PRODUCT$ ACC_PLUS_PRODUCT :$20. CATEGORY $ AMT;
cards;
11111 | 12345 | GOLD | 11111GOLD | B | 50
11111 | 24895 | BLACK | 11111BLACK | B | 25
11111 | 24445 | BLACK | 11111BLACK | B | 75
11111 | 24445 | SILVER | 11111SILVER | B | 140
11111 | 23456 | SILVER | 11111SILVER | S | 70
11111 | 86475 | GOLD | 11111GOLD | S | 130
22222 | 34567 | SILVER | 22222SILVER | B | 100
22222 | 12458 | GOLD | 22222GOLD | B | 50
22222 | 12557 | GOLD | 22222GOLD | S | 100
33333 | 23785 | GOLD | 33333GOLD | B | 75
33333 | 12968 | SILVER | 33333SILVER | B | 110
33333 | 12968 | SILVER | 33333SILVER | S | 110
33333 | 54368 | SILVER | 33333SILVER | S | 50
33333 | 94256 | SILVER | 33333SILVER | S | 175
;

proc sql;
	create table flow_list_remove_duplicate(drop=AMT) as
	select *, 
	SUM(AMT) as count_AMT
	from have
	group by ACC_NO ;
quit;

proc sort data=work.flow_list_remove_duplicate;
	by ACC_NO;
run;

DATA FINAL;
	SET WORK.flow_list_remove_duplicate;
	by ACC_NO;
	if first.ACC_NO then output;
run;

Thank in advance

RW9 · Posted 11-23-2017 09:11 AM

In the context of your query the other variables like subaccno don't make much sense, if you take a distinct acc_no, and sum amount then that is the data row, the other variables could be any of the records as they don't get summed. If you just want the last row in the group and a sum, then:

data want (drop=amt);
  set have;
  retain count_amt;
  by acc_no;
  if first.acc_no then count_amt=amt;
  else count_amt=sum(count_amt,amt);
  if last.acc_no;
run;

View solution in original post

sagulolo · Posted 11-23-2017 08:57 AM

Expected result as below:-

Obs	ACC_NO	SUB_ACCNO	PRODUCT	ACC_PLUS_PRODUCT	CATEGORY	count_AMT
1	11111	86475	GOLD	11111GOLD	S	490
2	22222	34567	SILVER	22222SILVER	B	250
3	33333	94256	SILVER	33333SILVER	S	520

RW9 · Posted 11-23-2017 09:11 AM

In the context of your query the other variables like subaccno don't make much sense, if you take a distinct acc_no, and sum amount then that is the data row, the other variables could be any of the records as they don't get summed. If you just want the last row in the group and a sum, then:

data want (drop=amt);
  set have;
  retain count_amt;
  by acc_no;
  if first.acc_no then count_amt=amt;
  else count_amt=sum(count_amt,amt);
  if last.acc_no;
run;

sagulolo · Posted 11-23-2017 10:24 AM

Hi RW9,

Thank for the code. Would appreciated if you could give guidance on the Proc SQL as well(if you're convenience).

RW9 · Posted 11-23-2017 10:36 AM

In what sense sorry? For the summation, SQL looks at the group by columns, so only takes ACC_NO and AMT, as these are the only items which relate to the summary statistic. As for trying to get the last record out within SQL, this tends to be trickier. SQL does not have the concept of a series of observations, it only looks at logical blocks of observations. So you would need to have a logical way of finding the last record, maybe looking at your test data: where SUB_ACCNO=max(SUB_ACCNO). Anyways, for this it is far simpler to do in datastep.

sagulolo · Posted 11-23-2017 10:50 AM

Hi RW9,

You're absolute right, this is actually a test data(apologize for the poor test data). At the moment, there is no any criteria need to select which data after the summation done. I believe next will be asking criteria like last record, latest date(which i didn't put in), max/min or etc.

Thank again for your guidance, with sample code of this "where SUB_ACCNO=max(SUB_ACCNO)", at least I have idea how to do it in Datastep and SQL as well.

Datastep / SQL in one step(more efficient code)

Re: Datastep / SQL in one step(more efficient code)

Re: Datastep / SQL in one step(more efficient code)

Re: Datastep / SQL in one step(more efficient code)

Re: Datastep / SQL in one step(more efficient code)

Re: Datastep / SQL in one step(more efficient code)

Re: Datastep / SQL in one step(more efficient code)

Catch up on SAS Innovate 2026

Datastep / SQL in one step(more efficient code)

Re: Datastep / SQL in one step(more efficient code)

Re: Datastep / SQL in one step(more efficient code)

Re: Datastep / SQL in one step(more efficient code)

Re: Datastep / SQL in one step(more efficient code)

Re: Datastep / SQL in one step(more efficient code)

Re: Datastep / SQL in one step(more efficient code)

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away