I am trying to create an aggregate dataset that will have a count of every time a person was stopped, frisked, etc within a census tract within a given month/year. Here is the code I am using and I should be going from 4 million cases to approximately 200K but it is not working and it is aggregating some but not all, so I still have duplicate month/year within census tracts instead of the sum of stops, etc. Please help!
proc sort data=sqfmonthyear;
by tract2010 sqfmthyr ;
run;
proc sql;
create table sqfagg as
select sqfmthyr, tract2010, count (*) as stopcount; sum(frisk) as friskcount format=comma10. , sum(furtive) as furtcount format=comma10. ,
sum(search) as searchcount format=comma10. , count(*) as stopcount format=comma10., sum(arrestmade) as arrestcount format=comma10.,
sum(summonissue) as summonscount format=comma10., sum(physicalforce) as phyfrc format=comma10.,
sum(criminaldress) format=comma10. as crimdress
from sqfmonthyear
group by tract2010, sqfmthyr ;
quit;
Yeah, I think you're definitely running into the date issue then. You need to either use PUT() or convert to PROC MEANS.
Please don't post your data as images. We can't work with an image and then to test your code we'd have to type it out. It's not really helpful. Also, please post your code in a code box, not as free text. It loses all formatting that way.
You don't need to presort for SQL.
proc sql;
create table sqfagg as
select put(sqfmthyr, YYYYMM6.) as reporting_period,
tract2010, count (*) as stopcount,
sum(frisk) as friskcount format=comma10. ,
sum(furtive) as furtcount format=comma10. ,
sum(search) as searchcount format=comma10. ,
count(*) as stopcount format=comma10.,
sum(arrestmade) as arrestcount format=comma10.,
sum(summonissue) as summonscount format=comma10.,
sum(physicalforce) as phyfrc format=comma10.,
sum(criminaldress) format=comma10. as crimdress
from sqfmonthyear
group by tract2010, put(sqfmthyr, YYYYMM6.) ;
quit;
If this doesn't work please post the log from this code.
What is not working about this? Show us.
If there is an error in the LOG, please SHOW US the log, showing the entire log for PROC SQL (we need to see the entire PROC SQL code as it appears in the log and we need to see the NOTEs and ERRORs and WARNINGs, with nothing chopped out, every single character)
Here is the log:
138
139 proc sql;
140 create table sqfagg as
141 select sqfmthyr, tract2010, count (*) as stopcount, sum(frisk) as friskcount
141! format=comma10. , sum(furtive) as furtcount format=comma10. ,
142 sum(search) as searchcount format=comma10. , sum(arrestmade) as arrestcount
142! format=comma10.,
143 sum(summonissue) as summonscount format=comma10., sum(physicalforce) as phyfrc
143! format=comma10.,
144 sum(criminaldress) format=comma10. as crimdress
145 from sqfmonthyear
146 group by tract2010, sqfmthyr ;
NOTE: Table WORK.SQFAGG created, with 1656327 rows and 10 columns.
147 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1.46 seconds
cpu time 1.42 seconds
It shows as if it is working without errors, but I should only have ~200K rows of data if it aggregates correctly
Did you look at the output data set sqfagg to see what was incorrect in there? Please do that. See what is wrong, and then SHOW US and explain further.
Since we don't have your data, please provide a small portion of your data (or make up some data) following these instructions. Please, for this small portion of the data, indicate what you think the correct answers are.
138
139 proc sql;
140 create table sqfagg as
141 select sqfmthyr, tract2010, count (*) as stopcount, sum(frisk) as friskcount
141! format=comma10. , sum(furtive) as furtcount format=comma10. ,
142 sum(search) as searchcount format=comma10. , sum(arrestmade) as arrestcount
142! format=comma10.,
143 sum(summonissue) as summonscount format=comma10., sum(physicalforce) as phyfrc
143! format=comma10.,
144 sum(criminaldress) format=comma10. as crimdress
145 from sqfmonthyear
146 group by tract2010, sqfmthyr ;
NOTE: Table WORK.SQFAGG created, with 1656327 rows and 10 columns.
147 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1.46 seconds
cpu time 1.42 seconds
Sorry, this is the first time I have asked a question. SQFMTHYR is a date (date YYYYMM format)
Here is the data, the first three rows should have been combined in the sum/count, so stopcount=3, friskcount=2, summonscount=1, phyfrc=3, other variables=0.
Yeah, I think you're definitely running into the date issue then. You need to either use PUT() or convert to PROC MEANS.
Please don't post your data as images. We can't work with an image and then to test your code we'd have to type it out. It's not really helpful. Also, please post your code in a code box, not as free text. It loses all formatting that way.
You don't need to presort for SQL.
proc sql;
create table sqfagg as
select put(sqfmthyr, YYYYMM6.) as reporting_period,
tract2010, count (*) as stopcount,
sum(frisk) as friskcount format=comma10. ,
sum(furtive) as furtcount format=comma10. ,
sum(search) as searchcount format=comma10. ,
count(*) as stopcount format=comma10.,
sum(arrestmade) as arrestcount format=comma10.,
sum(summonissue) as summonscount format=comma10.,
sum(physicalforce) as phyfrc format=comma10.,
sum(criminaldress) format=comma10. as crimdress
from sqfmonthyear
group by tract2010, put(sqfmthyr, YYYYMM6.) ;
quit;
If this doesn't work please post the log from this code.
When you run @Reeza 's code, look at the log. If you get a NOTE about remerging, change the GROUP BY clause to
group by tract2010, calculated reporting_period
But it might be that the syntax as posted already avoids the automatic remerge.
You have an extra semicolon after your first sum.
That's probably causing issues, but it's not in your log so something isn't lining up here.
Whats the format and type of variable sqfmthyr? If it's a date, it will aggregate at a daily level regardless of formats, SQL doesn't honour SAS formats when aggregating. If that's the case, you can convert it with a PUT() function or use PROC MEANS or REPORT or TABULATE to generate a nice clean table. I'd probably recommend the latter personally but it does depend on a few things.
@casmcfarland wrote:
I am trying to create an aggregate dataset that will have a count of every time a person was stopped, frisked, etc within a census tract within a given month/year. Here is the code I am using and I should be going from 4 million cases to approximately 200K but it is not working and it is aggregating some but not all, so I still have duplicate month/year within census tracts instead of the sum of stops, etc. Please help!
proc sort data=sqfmonthyear;
by tract2010 sqfmthyr ;
run;
proc sql;
create table sqfagg as
select sqfmthyr, tract2010, count (*) as stopcount; sum(frisk) as friskcount format=comma10. , sum(furtive) as furtcount format=comma10. ,
sum(search) as searchcount format=comma10. , count(*) as stopcount format=comma10., sum(arrestmade) as arrestcount format=comma10.,
sum(summonissue) as summonscount format=comma10., sum(physicalforce) as phyfrc format=comma10.,
sum(criminaldress) format=comma10. as crimdress
from sqfmonthyear
group by tract2010, sqfmthyr ;
quit;
Can you show some of the data that you believe should have been aggregated but is not? It's hard to picture without a few examples.
Jim
Hello @casmcfarland
You are grouping by only 2 variables. We need to verify if it is possible to reduce to 200K rows and not the result you got; Please run the following ( or something else ) to verify how many rows you would be obtaining in the output; The Rows_in_output will represent how many rows you can expect.
proc sql;
select count(*) as Rows_in_output from
(select distinct sqfmthyr, tract2010 from sqfmonthyear );
quit;
Please do post the result.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.