DATA Step, Macro, Functions and more

Datastep / SQL in one step(more efficient code)

Accepted Solution Solved
Reply
Contributor
Posts: 63
Accepted Solution

Datastep / SQL in one step(more efficient code)

Hi all,

 

I have a dataset and SQL below, this is the best i can do, I know it can be done in one step of Datastep or SQL. Need advice how to do it in one step(Datastep or SQL) instead of mine by 2 step.

 

data have;
infile cards dlm='|';
input ACC_NO SUB_ACCNO PRODUCT$ ACC_PLUS_PRODUCT :$20. CATEGORY $ AMT;
cards;
11111 | 12345 | GOLD | 11111GOLD | B | 50
11111 | 24895 | BLACK | 11111BLACK | B | 25
11111 | 24445 | BLACK | 11111BLACK | B | 75
11111 | 24445 | SILVER | 11111SILVER | B | 140
11111 | 23456 | SILVER | 11111SILVER | S | 70
11111 | 86475 | GOLD | 11111GOLD | S | 130
22222 | 34567 | SILVER | 22222SILVER | B | 100
22222 | 12458 | GOLD | 22222GOLD | B | 50
22222 | 12557 | GOLD | 22222GOLD | S | 100
33333 | 23785 | GOLD | 33333GOLD | B | 75
33333 | 12968 | SILVER | 33333SILVER | B | 110
33333 | 12968 | SILVER | 33333SILVER | S | 110
33333 | 54368 | SILVER | 33333SILVER | S | 50
33333 | 94256 | SILVER | 33333SILVER | S | 175
;

proc sql;
	create table flow_list_remove_duplicate(drop=AMT) as
	select *, 
	SUM(AMT) as count_AMT
	from have
	group by ACC_NO ;
quit;

proc sort data=work.flow_list_remove_duplicate;
	by ACC_NO;
run;

DATA FINAL;
	SET WORK.flow_list_remove_duplicate;
	by ACC_NO;
	if first.ACC_NO then output;
run;
	

Thank in advance


Accepted Solutions
Solution
‎11-23-2017 10:20 AM
Super User
Super User
Posts: 9,790

Re: Datastep / SQL in one step(more efficient code)

In the context of your query the other variables like subaccno don't make much sense, if you take a distinct acc_no, and sum amount then that is the data row, the other variables could be any of the records as they don't get summed.  If you just want the last row in the group and a sum, then:

data want (drop=amt);
  set have;
  retain count_amt;
  by acc_no;
  if first.acc_no then count_amt=amt;
else count_amt=sum(count_amt,amt);
if last.acc_no;
run;

View solution in original post


All Replies
Contributor
Posts: 63

Re: Datastep / SQL in one step(more efficient code)

Expected result as below:-

 

ObsACC_NOSUB_ACCNOPRODUCTACC_PLUS_PRODUCTCATEGORYcount_AMT
11111186475GOLD11111GOLDS490
22222234567SILVER22222SILVERB250
33333394256SILVER33333SILVERS520
Solution
‎11-23-2017 10:20 AM
Super User
Super User
Posts: 9,790

Re: Datastep / SQL in one step(more efficient code)

In the context of your query the other variables like subaccno don't make much sense, if you take a distinct acc_no, and sum amount then that is the data row, the other variables could be any of the records as they don't get summed.  If you just want the last row in the group and a sum, then:

data want (drop=amt);
  set have;
  retain count_amt;
  by acc_no;
  if first.acc_no then count_amt=amt;
else count_amt=sum(count_amt,amt);
if last.acc_no;
run;
Contributor
Posts: 63

Re: Datastep / SQL in one step(more efficient code)

[ Edited ]

Hi RW9,

 

Thank for the code. Would appreciated if you could give guidance on the Proc SQL as well(if you're convenience).

 

 

Super User
Super User
Posts: 9,790

Re: Datastep / SQL in one step(more efficient code)

In what sense sorry?  For the summation, SQL looks at the group by columns, so only takes ACC_NO and AMT, as these are the only items which relate to the summary statistic.  As for trying to get the last record out within SQL, this tends to be trickier.  SQL does not have the concept of a series of observations, it only looks at logical blocks of observations.  So you would need to have a logical way of finding the last record, maybe looking at your test data: where SUB_ACCNO=max(SUB_ACCNO).  Anyways, for this it is far simpler to do in datastep.

Contributor
Posts: 63

Re: Datastep / SQL in one step(more efficient code)

Hi RW9,

 

You're absolute right, this is actually a test data(apologize for the poor test data). At the moment, there is no any criteria need to select which data after the summation done. I believe next will be asking criteria like last record, latest date(which i didn't put in), max/min or etc.

 

Thank again for your guidance, with sample code of this "where SUB_ACCNO=max(SUB_ACCNO)", at least I have idea how to do it in  Datastep and SQL as well.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 168 views
  • 0 likes
  • 2 in conversation