BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Nandeep
Fluorite | Level 6

I have a dataset which has circa 1.5million rows of data in the following format

 

Postcode              Roof Type          Quotes

Postcode1            Roof1                  10

Postcode1            Roof2                   20

Postcode1            Roof3                   30

Postcode2            Roof1                  10

Postcode2            Roof2                   20

Postcode3            Roof2                  10

Postcode3            Roof3                   20

 

I'd like to output the percentage of the different roof types per post code, which I believe should look like:

 

 

Postcode              Roof Type          %

Postcode1            Roof1                  16.66

Postcode1            Roof2                   33.33

Postcode1            Roof3                   50

Postcode2            Roof1                  33.33

Postcode2            Roof2                   66.66

Postcode3            Roof2                  33.33

Postcode3            Roof3                   66.66

 

Would you be able to advise on the syntax required to produce this type of output? it can be in cross tab format if easier but I would need to export the resulting dataset.

 

thanks in advance

 

Nandeep

 

                 

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Please post test data in the form of a datastep.  As such this code is not tested:

proc sql;
  create table want as 
  select a.*,
         (a.quotes / (select sum(quotes) from have where  postcode=a.postcode)) * 100
  from   have a;
quit;

View solution in original post

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Please post test data in the form of a datastep.  As such this code is not tested:

proc sql;
  create table want as 
  select a.*,
         (a.quotes / (select sum(quotes) from have where  postcode=a.postcode)) * 100
  from   have a;
quit;
Nandeep
Fluorite | Level 6

@RW9 wrote:

Please post test data in the form of a datastep.  As such this code is not tested:

proc sql;
  create table want as 
  select a.*,
         (a.quotes / (select sum(quotes) from have where  postcode=a.postcode)) * 100
  from   have a;
quit;

Thanks that's worked a treat

RW9
Diamond | Level 26 RW9
Diamond | Level 26

In this instance I would mark @PaigeMiller 's answer as correct.  Always use the built in functions if they produce what you need as they will be simpler, faster, and use less resources.  SQL won't scale up well.

PaigeMiller
Diamond | Level 26

PROC FREQ was designed to do things like this, and also produces an output data set.

 

proc freq data=have;
    by postcode;
    table rooftype/out=want;
    weight quotes;
run;
--
Paige Miller
Nandeep
Fluorite | Level 6

@PaigeMiller wrote:

PROC FREQ was designed to do things like this, and also produces an output data set.

 

proc freq data=have;
    by postcode;
    table rooftype/out=want;
    weight quotes;
run;

this also worked thanks

Hasnan
Calcite | Level 5

Hi Nandeep

 

I would have used the proc freq method, but just an FYI with either method, it would take quite a while to run!

 

Regards

 

ballardw
Super User

@Hasnan wrote:

Hi Nandeep

 

I would have used the proc freq method, but just an FYI with either method, it would take quite a while to run!

 

Regards

 


Approximately 0.22 sec to create a data set with 1,500,000 records randomly assigning 10 postcodes.

0.14 Seconds for proc freq to summarize.

 

Of course that is my machine and your mileage may vary. But a proc freq for on the order of 1 or 2 million records for a single variable to a data set does not take very much time unless network or other performance issues arise.

I would recommend NOPRINT if you only want a data set with proc freq to avoid potentials display time delays building largish html output tables if you have many codes.

Nandeep
Fluorite | Level 6

@ballardw wrote:

@Hasnan wrote:

Hi Nandeep

 

I would have used the proc freq method, but just an FYI with either method, it would take quite a while to run!

 

Regards

 


Approximately 0.22 sec to create a data set with 1,500,000 records randomly assigning 10 postcodes.

0.14 Seconds for proc freq to summarize.

 

Of course that is my machine and your mileage may vary. But a proc freq for on the order of 1 or 2 million records for a single variable to a data set does not take very much time unless network or other performance issues arise.

I would recommend NOPRINT if you only want a data set with proc freq to avoid potentials display time delays building largish html output tables if you have many codes.


Hi,

 

Yes the NOPRINT certainly seems to have sped up the process, and even on my slow machine the Proc Freq was completed within 45 seconds.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 925 views
  • 0 likes
  • 5 in conversation