BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rfurtuna
Fluorite | Level 6

Hello everybody! I created a table with hockey players and a running total of their career points using the DATA step and an input table, but I need to recreate the same output using proc SQL, and I'm having a bit of trouble. 

 

Here's the code with the DATA step:

 

data mydata.career_points_data;
	set mydata.player_datamerged;
	where league="NHL";
	drop place_of_birth g a pim league;
	by player_id league_year;
	if first.player_id=1 then
		CAREER_POINTS=0;
	CAREER_POINTS+P;
run;

And I tried to create code with the same output with proc SQL to no avail:

 

proc sql;
	create table career_points_sql as select player_id, first_name, last_name, 
		date_of_birth, height_cm, weight_kg, shoots, primary_pos, gp, league_year, p, 
		ppg, CAREER_POINTS+P as CAREER_POINTS
		from mydata.player_datamerged where league="NHL" order by player_id, league_year;
quit; 

career sql ERROR LOG 2.png

Based on the log, I'm pretty sure the text I made red is what's causing issues, but I'm not sure how to do what I want in proc SQL. I attached my log, the errors start at line 36, where I ran the code with "CAREER_POINTS+P as CAREER_POINTS" commented out. 

 

So basically I'm trying to create a table with proc SQL that has an accumulating row that resets with every new player. I also attached the table that I created with the DATA step for reference; that is what I'm trying to recreate using proc SQL.

 

Any help is appreciated, 

 

Thanks!!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Running totals are easier to code for and perform better using a SAS data step especially when compared to the SAS SQL flavour.

Other SQL implementations (i.e. for SQL Server or Oracle have language extensions like analytics functions which better allow to implement running totals).

Below a SAS SQL which will return the same result like your data step.

proc sql;
  create table want as
  select 
    *,
    (select sum(P) 
        from mydata.career_points_data i 
        where i.player_id=o.player_id 
              and i.league_year<=o.league_year
/*              and i.league="NHL"*/
      ) 
      as CAREER_POINTS_SQL
  from mydata.career_points_data o
/*  where league="NHL";*/
  order by player_id, league_year
  ;
quit;

 

View solution in original post

7 REPLIES 7
andreas_lds
Jade | Level 19

In future posts always post the log as text using the </> icon, screenshots are unnecessary hard to read. The code you have posted has no errors (afaik), but in the screenshot i see a comma before FROM - check the beginning of your comment!

Why do you need have to re-write the code as sql?

rfurtuna
Fluorite | Level 6
I need to compare data with proc sql in a school assignment. Thanks for the advice! It was my first time posting.
Kurt_Bremser
Super User

Creating a running total like that is very hard to do in SQL. Stay with your data step.

Creating summary observations only (one per player), OTOH, is easy in SQL.

ballardw
Super User

While you can eventually create SQL that will do something like that it will not be easy and may be pretty fragile (as in unexpected values breaking code that works for some specific set).

 

SQL is designed around operations on SETS of values with operations like Intersect and Union (in set operation terms). Sets as such do not have any particular order so the process of using data in a specific order can be quite difficult.

 

Consider the tool and the job. If you want to cut down a tree you want a saw, not a hammer. You can eventually pound enough bits off of a tree that it will fall down but it won't go quickly. Here you are using the hammer (SQL).

 

On the other hand the Data step does a very poor job of working with many to many data set merges but SQL is designed for that.

 

 

rfurtuna
Fluorite | Level 6
Interesting. Well the only reason I was trying to do it both ways is for a school project where I need to compare the two options, so even just the fact that it's much harder with SQL is good for a comparison. Thank you.
Patrick
Opal | Level 21

Running totals are easier to code for and perform better using a SAS data step especially when compared to the SAS SQL flavour.

Other SQL implementations (i.e. for SQL Server or Oracle have language extensions like analytics functions which better allow to implement running totals).

Below a SAS SQL which will return the same result like your data step.

proc sql;
  create table want as
  select 
    *,
    (select sum(P) 
        from mydata.career_points_data i 
        where i.player_id=o.player_id 
              and i.league_year<=o.league_year
/*              and i.league="NHL"*/
      ) 
      as CAREER_POINTS_SQL
  from mydata.career_points_data o
/*  where league="NHL";*/
  order by player_id, league_year
  ;
quit;

 

rfurtuna
Fluorite | Level 6
Thank you very much

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2269 views
  • 4 likes
  • 5 in conversation