Aggregation without creating the new dataset

David_Billa · Posted 04-28-2023 01:58 PM

I've the following code which creates WANT dataset. Now I want to update the WANT dataset by doing sum(sales) and group by of other fields. I can do this in proc sql with create table statement but I don't want to create any new dataset and instead I want to update the existing WANT dataset with the aggregation as told above. Any help?

data work.want;
set work.have;     
      upn = material;
      country = propcase(country_name);
      mon_intrvl = put(date, yymmd7.);
      sales_date = intnx("month",date,0,"b");
      sales = delivery_qty;
  run;

ballardw · Posted 04-28-2023 04:04 PM

Artificial restriction like "I don't want to create any new dataset " don't lend themselves to clean easy to understand code.

Plus, without an example data set of what this is supposed to look like after the "update" I am not sure exactly what you are expecting as a result.

"Group by" isn't really a data step concept either. So an example gets more important.

Typically I would look at using something like Proc Summary to create the summaries and then merge back to this data.

If you can do it in SQL then why not?

In SAS terms any time you add a variable you are not actually updating a data set, you are creating a new set that may just happen to have the same name. That is what your example data step code does as well.

David_Billa · Posted 04-28-2023 04:16 PM

I don't know how to do it in sql to update and aggregate the WANT dataset
in one step.

It would be nice if you can point me the document where I can refer to
update and aggregate the table in proc summary and data step

Kurt_Bremser · Posted 04-29-2023 02:04 AM

@David_Billa wrote:
I don't know how to do it in sql to update and aggregate the WANT dataset
in one step.

It would be nice if you can point me the document where I can refer to
update and aggregate the table in proc summary and data step

Using SQL:

proc sql;
create table class as
  select *, sum(weight) as weightsum
  from sashelp.class
  group by sex
;
quit;

PROC SUMMARY:

proc summary data=sashelp.class nway;
class sex;
var weight;
output out=class_sum sum(weight)=weightsum;
run;

data class;
set sashelp.class;
if _n_ = 1
then do;
  length weightsum 8;
  declare hash s (dataset:"class_sum");
  s.definekey("sex");
  s.defineedata("weightsum");
  s.definedone();
  weightsum = .;
end;
rc = s.find();
drop rc;
run;

proc delete data=class_sum;
run;

While the SQL will re-order the data, the second method will keep the original order.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

David_Billa · Posted 04-29-2023 02:09 AM

Good to know about this method. Is it possible to achieve this with only
data step without using any procs? Just curious how to handle it only in
data step when we want to group by multiple variables.

Kurt_Bremser · Posted 04-29-2023 02:39 AM

Since all grouping implies some kind of sorting, you either need a preceding sorting step, or your dataset has to fit into available memory, so you can use a hash object. Be aware that the OUTPUT() method will write the new dataset in order of the key variable(s).

Either @DonH or @hashman might be able to show us a way how to use a hash object to create a sum by one variable, but order by another.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

DonH · Posted 04-30-2023 09:20 PM

I have to respectfully disagree with

Since all grouping implies some kind of sorting, you either need a preceding sorting step, or your dataset has to fit into available memory

The hash object does not require the source data to fit into memory. It can (but need not) require the summary table to fit into memory. The book and articles that were noted by @yabwon earlier provide details on memory management issues.

Sorting is not required for the hash object solutions described in @yabwon's comments.

I assume on your point about grouping by one variable but ordering by another, you are talking about the case where the order variable is not a grouping variable. If you can clarify I can perhaps provide an example. Regardless, unless the summary table is large, I don't see why a sort after the aggregation is a big deal

AhmedAl_Attar · Posted 04-29-2023 12:07 PM

@David_Billa

For generating aggregation within the Data Step, I would recommend looking into

- The DOW Loop

Additional papers can be found here

- Using Data Aggregation Using the SAS® Hash Object

Hope this helps

PaigeMiller · Posted 04-28-2023 04:21 PM

No such thing as updating an existing data set and at the same time creating a new column.

As @ballardw said, you can create a brand new data set that has the changes you want and has the same name as the previous data set (highly not recommended, it is a poor programming practice, but yes you can do that).

--
Paige Miller

Reeza · Posted 04-28-2023 04:24 PM

proc sql;
create table want as
select *, material as upn, propcase(country_name) as country,
         put(date, yymmd7.) as mon_intrvl, intnx('month', date, 0, 'b') as sales_date,
         delivery_qty as sales,  
        sum(delivery_qty) as total
from have
group by col1, col2, col3;
quit;

Is there a reason something like above wouldn't work?

David_Billa · Posted 04-28-2023 04:37 PM

There were some attrib statements and format statements between data and
set lines from the initial post which I have not shown. In that case how to
tweak your proc sql?

If I want to handle this in data step I wish to know how to do it given
that we have to 'group by' few fields after sum.

Whether your proc sql works if 'from' and 'create table' table name is same?

Reeza · Posted 04-28-2023 04:54 PM

@David_Billa wrote:
There were some attrib statements and format statements between data and
set lines from the initial post which I have not shown. In that case how to
tweak your proc sql?

Same way you would specify the attributes in any SQL, via column modifiers.

select put(....) as new_variable format = $8. length = $8.

If I want to handle this in data step I wish to know how to do it given
that we have to 'group by' few fields after sum.

It depends on the specifics. You have a few ways of doing this but all essentially recreate the table behind the scenes anyway. An exception to this may be if you're using a DB to store the data and writing directly to it. The code doesn't show the re-creation but it happens.

The SQL supports the group by without summarizing the data as long as you have some fields selected that are unique. It will not summarize the data.

Some options:

DoW loop to summarize (still has two passes of data)
Add empty column, summarize data via Summary/SQL/Means and "update" table

Whether your proc sql works if 'from' and 'create table' table name is same?

Yes, but it warns you this isn't a good idea and can lead to problems, which is why it's not recommended by anyone here. Just because you can, doesn't mean you should type deal.

data class;
set sashelp.class;
run;

proc sql;
create table class as 
select *, put(age, 8.) as age_char format=$8.,
         mean(weight) as avg_weight_by_gender
from class
group by sex, age;
quit;

Kurt_Bremser · Posted 04-29-2023 01:54 AM

Your step does create a new dataset. During execution, the file want.sas7bdat is read, and the results are written to want.sas7bdat.lck. When the step finishes successfully, want.sas7bdat is deleted, and want.sas7bdat.lck renamed to want.sas7bdat.

Anytime a new variable is added, or observations are inserted (not appended at the "end"), the whole dataset must be rewritten.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Aggregation without creating the new dataset

Re: Aggregation without creating the new dataset

Re: Aggregation without creating the new dataset

Re: Aggregation without creating the new dataset

Re: Aggregation without creating the new dataset

Re: Aggregation without creating the new dataset

Re: Aggregation without creating the new dataset

Re: Aggregation without creating the new dataset

Re: Aggregation without creating the new dataset

Re: Aggregation without creating the new dataset

Re: Aggregation without creating the new dataset

Re: Aggregation without creating the new dataset

Re: Aggregation without creating the new dataset

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!