BookmarkSubscribeRSS Feed
David_Billa
Rhodochrosite | Level 12

I've the following code which creates WANT dataset. Now I want to update the WANT dataset by doing sum(sales) and group by of other fields. I can do this in proc sql with create table statement but I don't want to create any  new dataset and instead I want to update the existing WANT dataset with the aggregation as told above. Any help?

 

data work.want;
set work.have;     
      upn = material;
      country = propcase(country_name);
      mon_intrvl = put(date, yymmd7.);
      sales_date = intnx("month",date,0,"b");
      sales = delivery_qty;
  run;
12 REPLIES 12
ballardw
Super User

Artificial restriction like "I don't want to create any new dataset " don't lend themselves to clean easy to understand code.

Plus, without an example data set of what this is supposed to look like after the "update" I am not sure exactly what you are expecting as a result.

 

"Group by" isn't really a data step concept either. So an example gets more important.

Typically I would look at using something like Proc Summary to create the summaries and then merge back to this data.

If you can do it in SQL then why not?

In SAS terms any time you add a variable you are not actually updating a data set, you are creating a new set that may just happen to have the same name. That is what your example data step code does as well.

David_Billa
Rhodochrosite | Level 12
I don't know how to do it in sql to update and aggregate the WANT dataset
in one step.

It would be nice if you can point me the document where I can refer to
update and aggregate the table in proc summary and data step
Kurt_Bremser
Super User

@David_Billa wrote:
I don't know how to do it in sql to update and aggregate the WANT dataset
in one step.

It would be nice if you can point me the document where I can refer to
update and aggregate the table in proc summary and data step

Using SQL:

proc sql;
create table class as
  select *, sum(weight) as weightsum
  from sashelp.class
  group by sex
;
quit;

PROC SUMMARY:

proc summary data=sashelp.class nway;
class sex;
var weight;
output out=class_sum sum(weight)=weightsum;
run;

data class;
set sashelp.class;
if _n_ = 1
then do;
  length weightsum 8;
  declare hash s (dataset:"class_sum");
  s.definekey("sex");
  s.defineedata("weightsum");
  s.definedone();
  weightsum = .;
end;
rc = s.find();
drop rc;
run;

proc delete data=class_sum;
run;

While the SQL will re-order the data, the second method will keep the original order.

David_Billa
Rhodochrosite | Level 12
Good to know about this method. Is it possible to achieve this with only
data step without using any procs? Just curious how to handle it only in
data step when we want to group by multiple variables.
Kurt_Bremser
Super User

Since all grouping implies some kind of sorting, you either need a preceding sorting step, or your dataset has to fit into available memory, so you can use a hash object. Be aware that the OUTPUT() method will write the new dataset in order of the key variable(s).

Either @DonH or @hashman might be able to show us a way how to use a hash object to create a sum by one variable, but order by another.

DonH
Lapis Lazuli | Level 10

I have to respectfully disagree with

 

Since all grouping implies some kind of sorting, you either need a preceding sorting step, or your dataset has to fit into available memory

The hash object does not require the source data to fit into memory. It can (but need not) require the summary table to fit into memory.  The book and articles that were noted by @yabwon earlier provide details on memory management issues.

Sorting is not required for the hash object solutions described in @yabwon's comments.

I assume on your point about grouping by one variable but ordering by another, you are talking about the case where the order variable is not a grouping variable. If you can clarify I can perhaps provide an example. Regardless, unless the summary table is large, I don't see why a sort after the aggregation is a big deal

AhmedAl_Attar
Rhodochrosite | Level 12

@David_Billa 

For generating aggregation within the Data Step, I would recommend looking into

- The DOW Loop

Additional papers can be found here 

- Using Data Aggregation Using the SAS® Hash Object     

 

Hope this helps

PaigeMiller
Diamond | Level 26

No such thing as updating an existing data set and at the same time creating a new column.

 

As @ballardw said, you can create a brand new data set that has the changes you want and has the same name as the previous data set (highly not recommended, it is a poor programming practice, but yes you can do that).

--
Paige Miller
Reeza
Super User
proc sql;
create table want as
select *, material as upn, propcase(country_name) as country,
         put(date, yymmd7.) as mon_intrvl, intnx('month', date, 0, 'b') as sales_date,
         delivery_qty as sales,  
        sum(delivery_qty) as total
from have
group by col1, col2, col3;
quit;

Is there a reason something like above wouldn't work?

David_Billa
Rhodochrosite | Level 12
There were some attrib statements and format statements between data and
set lines from the initial post which I have not shown. In that case how to
tweak your proc sql?

If I want to handle this in data step I wish to know how to do it given
that we have to 'group by' few fields after sum.

Whether your proc sql works if 'from' and 'create table' table name is same?
Reeza
Super User

@David_Billa wrote:
There were some attrib statements and format statements between data and
set lines from the initial post which I have not shown. In that case how to
tweak your proc sql?



Same way you would specify the attributes in any SQL, via column modifiers.

select put(....) as new_variable format = $8. length = $8. 

If I want to handle this in data step I wish to know how to do it given
that we have to 'group by' few fields after sum.


It depends on the specifics. You have a few ways of doing this but all essentially recreate the table behind the scenes anyway. An exception to this may be if you're using a DB to store the data and writing directly to it. The code doesn't show the re-creation but it happens. 

The SQL supports the group by without summarizing the data as long as you have some fields selected that are unique. It will not summarize the data. 

Some options:

  • DoW loop to summarize (still has two passes of data)
  • Add empty column, summarize data via Summary/SQL/Means and "update" table



Whether your proc sql works if 'from' and 'create table' table name is same?

Yes, but it warns you this isn't a good idea and can lead to problems, which is why it's not recommended by anyone here. Just because you can, doesn't mean you should type deal. 

 

data class;
set sashelp.class;
run;

proc sql;
create table class as 
select *, put(age, 8.) as age_char format=$8.,
         mean(weight) as avg_weight_by_gender
from class
group by sex, age;
quit;

 

 

Kurt_Bremser
Super User

Your step does create a new dataset. During execution, the file want.sas7bdat is read, and the results are written to want.sas7bdat.lck. When the step finishes successfully, want.sas7bdat is deleted, and want.sas7bdat.lck renamed to want.sas7bdat.

Anytime a new variable is added, or observations are inserted (not appended at the "end"), the whole dataset must be rewritten.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1274 views
  • 0 likes
  • 7 in conversation