BookmarkSubscribeRSS Feed
juliajulia
Obsidian | Level 7

Hello experts,

 

I try to use proc sql to create a new variable from the original variable p1. the new variable (p2) is aggregated variable by people's id and their area. my original data like in table 1 and the result that i want is like in table 2:

table 1

id year area s1 o1 P1
1 2020     0 1
1 2020 NEW YORK   0 0
1 2020 NEW YORK 1 1 0

 

table 2

ID area year O2 S2 P2
1   2020 0 0 1
1 NY 2020 1 1 0

 

the sas code i used is:
proc sql;
create table data.table2
as select distinct id, area, year, s1, 
case when sum(p1)>=1 then 1 else 0 end as p2,

case when sum(o1)>=1 then 1 else 0 end as o2,

case when sum(s1)>=1 then 1 else 0 end as s2
from data.table1
group by id , area;
quit;

 the P2 (in red color) suppose to be 0 but my result is 1. i don't know what's wrong with my code. please help.

ID area   O2 S2 P2
1   2020 0 0 1
1 NY 2020 1 1 1
3 REPLIES 3
PaigeMiller
Diamond | Level 26

Could you please provide the data (for this thread, as well as for all future threads) as working SAS data step code, like you did in this thread? Thank you.

--
Paige Miller
s_lassen
Meteorite | Level 14

I think the problem is the SUM() function. In normal SAS, it will just add the non-missing parameters to 0, but in SQL it takes the sum of the column over the whole table (or the group, if you use GROUP BY).

 

I assume you used the SUM() function to account for missing values. In SAS SQL, that is not necessary for your comparison, as missing (.) is considered smaller than any other value. In most other SQL dialects, though, the result is different: a comparison involving a missing value always returns FALSE - in that case you may need to correct for missing values - except that your CASE statement would then default to the ELSE clause, so you would also get the right result there without doing anything special to account for missing values.

 

However, if you really need to correct missing values in an SQL comparison, you should use the COALESCE() function, e.g.:

case when coalesce(p1,0)>=1 then 1 else 0 end as p2,

But in your actual code, both SAS and standard SQL should give the answer you want with just

case when p1>=1 then 1 else 0 end as p2,

for the reasons stated above.

 

Tom
Super User Tom
Super User

Why are you including AREA in the list of variables output, but not in the list of GROUP BY variables?

That will force SAS to remerge the overall summary by ID and AREA back onto the individual observations.

NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Table WORK.WANT created, with 3 rows and 6 columns.

Make sure to match the GROUP BY variables to what you want to actually sum().

data have;
  infile cards dsd dlm='|' truncover;
  input id year area :$12. s1 o1 P1;
cards;
1|2020| | |0|1
1|2020|NEW YORK| |0|0
1|2020|NEW YORK|1|1|0
;

proc sql;
create table want as 
select id,area,year
     , sum(s1)>0 as s2
     , sum(o1)>0 as o2
     , sum(p1)>0 as p2
 from have
 group by id,area,year
;
quit;

Tom_0-1670608841817.png

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 876 views
  • 0 likes
  • 4 in conversation