I have a dataset which has circa 1.5million rows of data in the following format
Postcode Roof Type Quotes
Postcode1 Roof1 10
Postcode1 Roof2 20
Postcode1 Roof3 30
Postcode2 Roof1 10
Postcode2 Roof2 20
Postcode3 Roof2 10
Postcode3 Roof3 20
I'd like to output the percentage of the different roof types per post code, which I believe should look like:
Postcode Roof Type %
Postcode1 Roof1 16.66
Postcode1 Roof2 33.33
Postcode1 Roof3 50
Postcode2 Roof1 33.33
Postcode2 Roof2 66.66
Postcode3 Roof2 33.33
Postcode3 Roof3 66.66
Would you be able to advise on the syntax required to produce this type of output? it can be in cross tab format if easier but I would need to export the resulting dataset.
thanks in advance
Nandeep
Please post test data in the form of a datastep. As such this code is not tested:
proc sql; create table want as select a.*, (a.quotes / (select sum(quotes) from have where postcode=a.postcode)) * 100 from have a; quit;
Please post test data in the form of a datastep. As such this code is not tested:
proc sql; create table want as select a.*, (a.quotes / (select sum(quotes) from have where postcode=a.postcode)) * 100 from have a; quit;
@RW9 wrote:Please post test data in the form of a datastep. As such this code is not tested:
proc sql; create table want as select a.*, (a.quotes / (select sum(quotes) from have where postcode=a.postcode)) * 100 from have a; quit;
Thanks that's worked a treat
In this instance I would mark @PaigeMiller 's answer as correct. Always use the built in functions if they produce what you need as they will be simpler, faster, and use less resources. SQL won't scale up well.
PROC FREQ was designed to do things like this, and also produces an output data set.
proc freq data=have;
by postcode;
table rooftype/out=want;
weight quotes;
run;
@PaigeMiller wrote:PROC FREQ was designed to do things like this, and also produces an output data set.
proc freq data=have; by postcode; table rooftype/out=want; weight quotes; run;
this also worked thanks
Hi Nandeep
I would have used the proc freq method, but just an FYI with either method, it would take quite a while to run!
Regards
@Hasnan wrote:
Hi Nandeep
I would have used the proc freq method, but just an FYI with either method, it would take quite a while to run!
Regards
Approximately 0.22 sec to create a data set with 1,500,000 records randomly assigning 10 postcodes.
0.14 Seconds for proc freq to summarize.
Of course that is my machine and your mileage may vary. But a proc freq for on the order of 1 or 2 million records for a single variable to a data set does not take very much time unless network or other performance issues arise.
I would recommend NOPRINT if you only want a data set with proc freq to avoid potentials display time delays building largish html output tables if you have many codes.
@ballardw wrote:
@Hasnan wrote:Hi Nandeep
I would have used the proc freq method, but just an FYI with either method, it would take quite a while to run!
Regards
Approximately 0.22 sec to create a data set with 1,500,000 records randomly assigning 10 postcodes.
0.14 Seconds for proc freq to summarize.
Of course that is my machine and your mileage may vary. But a proc freq for on the order of 1 or 2 million records for a single variable to a data set does not take very much time unless network or other performance issues arise.
I would recommend NOPRINT if you only want a data set with proc freq to avoid potentials display time delays building largish html output tables if you have many codes.
Hi,
Yes the NOPRINT certainly seems to have sped up the process, and even on my slow machine the Proc Freq was completed within 45 seconds.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.