Solved: Re: Calculate percentages per variable

Nandeep · Posted 02-07-2019 04:47 AM

I have a dataset which has circa 1.5million rows of data in the following format

Postcode Roof Type Quotes

Postcode1 Roof1 10

Postcode1 Roof2 20

Postcode1 Roof3 30

Postcode2 Roof1 10

Postcode2 Roof2 20

Postcode3 Roof2 10

Postcode3 Roof3 20

I'd like to output the percentage of the different roof types per post code, which I believe should look like:

Postcode Roof Type %

Postcode1 Roof1 16.66

Postcode1 Roof2 33.33

Postcode1 Roof3 50

Postcode2 Roof1 33.33

Postcode2 Roof2 66.66

Postcode3 Roof2 33.33

Postcode3 Roof3 66.66

Would you be able to advise on the syntax required to produce this type of output? it can be in cross tab format if easier but I would need to export the resulting dataset.

thanks in advance

Nandeep

RW9 · Posted 02-07-2019 05:25 AM

Please post test data in the form of a datastep. As such this code is not tested:

proc sql;
  create table want as 
  select a.*,
         (a.quotes / (select sum(quotes) from have where  postcode=a.postcode)) * 100
  from   have a;
quit;

View solution in original post

RW9 · Posted 02-07-2019 05:25 AM

Please post test data in the form of a datastep. As such this code is not tested:

proc sql;
  create table want as 
  select a.*,
         (a.quotes / (select sum(quotes) from have where  postcode=a.postcode)) * 100
  from   have a;
quit;

Nandeep · Posted 02-07-2019 09:31 AM

@RW9 wrote:
Please post test data in the form of a datastep. As such this code is not tested:
proc sql;
  create table want as 
  select a.*,
         (a.quotes / (select sum(quotes) from have where  postcode=a.postcode)) * 100
  from   have a;
quit;

Thanks that's worked a treat

RW9 · Posted 02-07-2019 10:19 AM

In this instance I would mark @PaigeMiller 's answer as correct. Always use the built in functions if they produce what you need as they will be simpler, faster, and use less resources. SQL won't scale up well.

PaigeMiller · Posted 02-07-2019 09:16 AM

PROC FREQ was designed to do things like this, and also produces an output data set.

proc freq data=have;
    by postcode;
    table rooftype/out=want;
    weight quotes;
run;

--
Paige Miller

Nandeep · Posted 02-07-2019 09:32 AM

@PaigeMiller wrote:
PROC FREQ was designed to do things like this, and also produces an output data set.
proc freq data=have;
    by postcode;
    table rooftype/out=want;
    weight quotes;
run;

this also worked thanks

Hasnan · Posted 02-07-2019 10:26 AM

Hi Nandeep

I would have used the proc freq method, but just an FYI with either method, it would take quite a while to run!

Regards

ballardw · Posted 02-07-2019 10:52 AM

@Hasnan wrote:

Hi Nandeep

I would have used the proc freq method, but just an FYI with either method, it would take quite a while to run!

Regards

Approximately 0.22 sec to create a data set with 1,500,000 records randomly assigning 10 postcodes.

0.14 Seconds for proc freq to summarize.

Of course that is my machine and your mileage may vary. But a proc freq for on the order of 1 or 2 million records for a single variable to a data set does not take very much time unless network or other performance issues arise.

I would recommend NOPRINT if you only want a data set with proc freq to avoid potentials display time delays building largish html output tables if you have many codes.

Nandeep · Posted 02-08-2019 02:30 AM

@ballardw wrote:
@Hasnan wrote:
Hi Nandeep

I would have used the proc freq method, but just an FYI with either method, it would take quite a while to run!

Regards

Approximately 0.22 sec to create a data set with 1,500,000 records randomly assigning 10 postcodes.
0.14 Seconds for proc freq to summarize.

Of course that is my machine and your mileage may vary. But a proc freq for on the order of 1 or 2 million records for a single variable to a data set does not take very much time unless network or other performance issues arise.
I would recommend NOPRINT if you only want a data set with proc freq to avoid potentials display time delays building largish html output tables if you have many codes.

Hi,

Yes the NOPRINT certainly seems to have sped up the process, and even on my slow machine the Proc Freq was completed within 45 seconds.

The 2025 SAS Hackathon has begun!