BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
casmcfarland
Calcite | Level 5

I am trying to create an aggregate dataset that will have a count of every time a person was stopped, frisked, etc within a census tract within a given month/year. Here is the code I am using and I should be going from 4 million cases to approximately 200K but it is not working and it is aggregating some but not all, so I still have duplicate month/year within census tracts instead of the sum of stops, etc. Please help!

 

proc sort data=sqfmonthyear;
by tract2010 sqfmthyr ;
run;

 

proc sql;
create table sqfagg as
select sqfmthyr, tract2010, count (*) as stopcount; sum(frisk) as friskcount format=comma10. , sum(furtive) as furtcount format=comma10. ,
sum(search) as searchcount format=comma10. , count(*) as stopcount format=comma10., sum(arrestmade) as arrestcount format=comma10.,
sum(summonissue) as summonscount format=comma10., sum(physicalforce) as phyfrc format=comma10.,
sum(criminaldress) format=comma10. as crimdress
from sqfmonthyear
group by tract2010, sqfmthyr ;
quit;

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Yeah, I think you're definitely running into the date issue then. You need to either use PUT() or convert to PROC MEANS.

Please don't post your data as images. We can't work with an image and then to test your code we'd have to type it out. It's not really helpful. Also, please post your code in a code box, not as free text. It loses all formatting that way.

You don't need to presort for SQL.

proc sql;
create table sqfagg as
select put(sqfmthyr, YYYYMM6.) as reporting_period,
 tract2010, count (*) as stopcount,
 sum(frisk) as friskcount format=comma10. , 
sum(furtive) as furtcount format=comma10. ,
sum(search) as searchcount format=comma10. , 
count(*) as stopcount format=comma10., 
sum(arrestmade) as arrestcount format=comma10.,
sum(summonissue) as summonscount format=comma10., 
sum(physicalforce) as phyfrc format=comma10.,
sum(criminaldress) format=comma10. as crimdress

from sqfmonthyear

group by tract2010, put(sqfmthyr, YYYYMM6.) ;
quit;

If this doesn't work please post the log from this code.

View solution in original post

10 REPLIES 10
PaigeMiller
Diamond | Level 26

What is not working about this? Show us.

 

If there is an error in the LOG, please SHOW US the log, showing the entire log for PROC SQL (we need to see the entire PROC SQL code as it appears in the log and we need to see the NOTEs and ERRORs and WARNINGs, with nothing chopped out, every single character)

--
Paige Miller
casmcfarland
Calcite | Level 5

Here is the log:


138
139 proc sql;
140 create table sqfagg as
141 select sqfmthyr, tract2010, count (*) as stopcount, sum(frisk) as friskcount
141! format=comma10. , sum(furtive) as furtcount format=comma10. ,
142 sum(search) as searchcount format=comma10. , sum(arrestmade) as arrestcount
142! format=comma10.,
143 sum(summonissue) as summonscount format=comma10., sum(physicalforce) as phyfrc
143! format=comma10.,
144 sum(criminaldress) format=comma10. as crimdress
145 from sqfmonthyear
146 group by tract2010, sqfmthyr ;
NOTE: Table WORK.SQFAGG created, with 1656327 rows and 10 columns.

147 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1.46 seconds
cpu time 1.42 seconds

 

 

It shows as if it is working without errors, but I should only have ~200K rows of data if it aggregates correctly

 

PaigeMiller
Diamond | Level 26

Did you look at the output data set sqfagg to see what was incorrect in there? Please do that. See what is wrong, and then SHOW US and explain further.

 

Since we don't have your data, please provide a small portion of your data (or make up some data) following these instructions. Please, for this small portion of the data, indicate what you think the correct answers are.

--
Paige Miller
casmcfarland
Calcite | Level 5


138
139 proc sql;
140 create table sqfagg as
141 select sqfmthyr, tract2010, count (*) as stopcount, sum(frisk) as friskcount
141! format=comma10. , sum(furtive) as furtcount format=comma10. ,
142 sum(search) as searchcount format=comma10. , sum(arrestmade) as arrestcount
142! format=comma10.,
143 sum(summonissue) as summonscount format=comma10., sum(physicalforce) as phyfrc
143! format=comma10.,
144 sum(criminaldress) format=comma10. as crimdress
145 from sqfmonthyear
146 group by tract2010, sqfmthyr ;
NOTE: Table WORK.SQFAGG created, with 1656327 rows and 10 columns.

147 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1.46 seconds
cpu time 1.42 seconds

 

Sorry, this is the first time I have asked a question. SQFMTHYR is a date (date YYYYMM format)

 

Here is the data, the first three rows should have been combined in the sum/count, so stopcount=3, friskcount=2, summonscount=1, phyfrc=3, other variables=0.

casmcfarland_0-1622939963563.png

 

 

 

 

 

Reeza
Super User

Yeah, I think you're definitely running into the date issue then. You need to either use PUT() or convert to PROC MEANS.

Please don't post your data as images. We can't work with an image and then to test your code we'd have to type it out. It's not really helpful. Also, please post your code in a code box, not as free text. It loses all formatting that way.

You don't need to presort for SQL.

proc sql;
create table sqfagg as
select put(sqfmthyr, YYYYMM6.) as reporting_period,
 tract2010, count (*) as stopcount,
 sum(frisk) as friskcount format=comma10. , 
sum(furtive) as furtcount format=comma10. ,
sum(search) as searchcount format=comma10. , 
count(*) as stopcount format=comma10., 
sum(arrestmade) as arrestcount format=comma10.,
sum(summonissue) as summonscount format=comma10., 
sum(physicalforce) as phyfrc format=comma10.,
sum(criminaldress) format=comma10. as crimdress

from sqfmonthyear

group by tract2010, put(sqfmthyr, YYYYMM6.) ;
quit;

If this doesn't work please post the log from this code.

casmcfarland
Calcite | Level 5
This worked perfectly!
Thanks!!!!
Kurt_Bremser
Super User

When you run @Reeza 's code, look at the log. If you get a NOTE about remerging, change the GROUP BY clause to

group by tract2010, calculated reporting_period

But it might be that the syntax as posted already avoids the automatic remerge.

Reeza
Super User

You have an extra semicolon after your first sum. 

That's probably causing issues, but it's not in your log so something isn't lining up here. 

 

Whats the format and type of variable sqfmthyr? If it's a date, it will aggregate at a daily level regardless of formats, SQL doesn't honour SAS formats when aggregating. If that's the case, you can convert it with a PUT() function or use PROC MEANS or REPORT or TABULATE to generate a nice clean table. I'd probably recommend the latter personally but it does depend on a few things. 

 


@casmcfarland wrote:

I am trying to create an aggregate dataset that will have a count of every time a person was stopped, frisked, etc within a census tract within a given month/year. Here is the code I am using and I should be going from 4 million cases to approximately 200K but it is not working and it is aggregating some but not all, so I still have duplicate month/year within census tracts instead of the sum of stops, etc. Please help!

 

proc sort data=sqfmonthyear;
by tract2010 sqfmthyr ;
run;

 

proc sql;
create table sqfagg as
select sqfmthyr, tract2010, count (*) as stopcount; sum(frisk) as friskcount format=comma10. , sum(furtive) as furtcount format=comma10. ,
sum(search) as searchcount format=comma10. , count(*) as stopcount format=comma10., sum(arrestmade) as arrestcount format=comma10.,
sum(summonissue) as summonscount format=comma10., sum(physicalforce) as phyfrc format=comma10.,
sum(criminaldress) format=comma10. as crimdress
from sqfmonthyear
group by tract2010, sqfmthyr ;
quit;


 

jimbarbour
Meteorite | Level 14

Can you show some of the data that you believe should have been aggregated but is not?  It's hard to picture without a few examples.

 

Jim

Sajid01
Meteorite | Level 14

Hello @casmcfarland 
You are grouping by only 2 variables. We need to verify if it is possible to reduce to 200K rows and not the result you got; Please run the following ( or something else  ) to verify how many rows you would be obtaining in the output; The Rows_in_output will represent how many rows you can expect.

proc sql;
select count(*) as Rows_in_output from
(select distinct sqfmthyr, tract2010 from sqfmonthyear );
quit;

Please do post the result.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1378 views
  • 0 likes
  • 6 in conversation