proc sql;
create table dataset2 as
select var1,
mean(var2) as average,
(select sum(var3) from dataset1 where var2>=1 group by var1),
(select sum(var3) from dataset1 where var2>=1 group by var1)
from dataset1
group by var1;
quit;
ERROR: SUBQUERY EVALUATED TO MORE THAN ONE ROW
Hi everyone,
I am new to SAS and I'd like to understand about subqueries. So what I want is to have a table with 4 columns:
Therefore I want the "where" statement to trigger only for the last two columns. The subquery works if I dont add the "group by" but in that case I am not able to have the breakdown by var1. On the other hand if I add the "group by" in the subquery I get this error: Subquery evaluated to more than one row.
I know that I could obtain the same table with joins but I really would like to understand better how to exploit subqueries.
Thanks in advance for your help.
I should add that no subquery or join is needed to perform that operation. You can simply do:
proc sql;
create table dataset2 as
select var1,
mean(var2) as average,
sum(var3 * (var2>=1)) as var2Sup1,
sum(var3 * (var2<1)) as var2Inf1
from dataset1
group by var1;
quit;
Since you want Var1 to be the same in the main query and the subquery, you must explicitly correlate the subqueries with the main query. Try:
proc sql;
create table dataset2 as
select var1,
mean(var2) as average,
(select sum(var3) from dataset1 where var1=A.var1 and var2>=1 group by var1) as var2Sup1,
(select sum(var3) from dataset1 where var1=A.var1 and var2<1 group by var1) as var2Inf1
from dataset1 as A
group by var1;
quit;
I should add that no subquery or join is needed to perform that operation. You can simply do:
proc sql;
create table dataset2 as
select var1,
mean(var2) as average,
sum(var3 * (var2>=1)) as var2Sup1,
sum(var3 * (var2<1)) as var2Inf1
from dataset1
group by var1;
quit;
I chose this answer as solution but I'd like to understand how the formula exactly work.
sum(var3 * (var2>=1)) as var2Sup1
it sums var3 by var1 as I want but I dont understand how the multiplication works. Is (var2>=1) a dummy variable?
For an overview of the types of applications that subqueries support, see this topic about subqueries in SAS Enterprise Guide. If you have SAS Enterprise Guide you can use point-and-click methods to generate some syntax you might learn from. BUT, the SAS Enterprise Guide methods don't support correlated subqueries -- which @PGStats just helped you with.
In EG, here are the types of subqueries that you can create using query-based templates:
Here are the types of subqueries that you cannot create using query-based templates:
proc sql;
create table dataset2 as
select var1,
mean(var2) as average,
(select sum(var3) from dataset1 where var2>=1 and var1=a.var1),
(select sum(var3) from dataset1 where var2>=1 and var1=a.var1)
from dataset1 as a
group by var1;
quit;
Also this code is good. Thank you 🙂 I dont know how to select more than a solution to the post
Unless I misunderstand the data structure, I think the way I would do this is:
proc sql;
create table dataset2 as
select var1,
mean(var2) as average,
sum( case when var2>=1 then var3 else 0 end) as total1,
sum(case when var2<1 then var3 else 0 end) as total2
from dataset1
group by var1;
quit;
Dan Keating
Hey @NewsGuy, I think your approach is the same as @PGStats, except that you used a CASE instead of relying on the fact that an equality check (var2>=1) will resolve to a 1 or a 0, depending on the value of var2. Your approach is more explicit, while @PGStats approach is more elegant (even if it takes a moment to think it through).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.