BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
kk131
Fluorite | Level 6

Hi,

 

I am a new sas user and currently working on an assignment. I have some raw data which includes product categories, rating, no. of ratings and prices. I have used the proc freq to show no. of products within each category and the % contribution. I have also sorted the data as per the number of products within each category. However, I only want to show the top 5 data and now the entire table, since some categories have less than 10 products which may not be relevant. I have currently used the below code and also tried using obs = 5 etc but that has not worked, the output was not sorted and picked the top 5 results.  Alternatively, I have tried using another code that only displays the top most category and not the rest. Any suggestions on what I can do? Sharing the two codes and the screenshot of the O/P i am receiving. 

 

Original code: 

proc freq data= Amazon1.Products_mod 
order = freq;
tables category;
run;

proc sort data= amazon1.products_mod;
by category;
run;


Alternate code:

 

/* Sort the dataset by category */
proc sort data=amazon1.products_mod out=sorted_products;
by category;
run;

/* Calculate the frequency of each category */
proc freq data=sorted_products;
tables category / out=freq_table;
run;

/* Sort the frequency table in descending order of frequency */
proc sort data=freq_table;
by descending count;
run;

/* Print the first 5 rows of the frequency table */
proc print data=freq_table(obs=5);
var category count;
run;

kk131_0-1684450978496.png

 

Any help will be appreciated! Thank you 🙂 

1 ACCEPTED SOLUTION

Accepted Solutions
Watts
SAS Employee

You can use the MAXLEVELS= option, together with ORDER=FREQ, to display the categories with the largest frequencies (most products). For example,

proc freq  data=Amazon1.Products_mod  order=freq;
	tables category / maxlevels=5;
run;

View solution in original post

4 REPLIES 4
ballardw
Super User

I think that you need to provide a bit more details.

For instance you mention: " product categories, rating, no. of ratings and prices" but do not describe anywhere quite how anything other than Category is involved. And there seems to be an implied use of "product" within categories but you don't mention that at all other than to say that you "sorted the data as per the number of products within each category".

Best would be to provide a small example of some dummy data with maybe 5 categories and a number of values for the other variables. The sample should be small enough to show what you expect for output given that example data. Then show that desired result. The categories don't even have to be very entertaining, single letters or even numerals would do.

The example is best provided in the form of data step code. Something like this:

data have;
   input category $ rating NumberOfRatings price;
datalines;
a   4   25 15.55
b   3.5  16  12.01
;

If the actual products are import as well then you need to add a Product variable. That might look like:

data have;
   input category $ rating NumberOfRatings price product $;
datalines;
a   4   25 15.55     zz
b   3.5  16  12.01   yy
;

The above code will only allow values of Category and Product to have 8 characters and no spaces.

 

Note that there is no inherent "top 5" of anything. There details needed to define "top". A "top" score might be the lowest numeric value , as in 1, 2, 3 for first, second, third. So you need to provide some rules as to define what is used to determine the ranking order AND how to calculate the rank (might be largest count or something else).

 

From looking at that proc print output you may need to redesign your data depending on what you expect for analysis. The values look like menu level choices.

mkeintz
PROC Star

Here is code for dataset sashelp.cars, which I think emulates what you want (i.e. a print of the category ("make" in this example) and frequency of the most frequent makes, in descending order.

 

proc freq data=sashelp.cars  order=freq;
  table make / out=result;
run;
proc print data=result (obs=5);
  var make count;
run;

Note you could add the option NOPRINT to the proc freq statement if you are not interested in seeing all the frequencies.  The "out=" option generates a dataset of frequencies, from which you can print out selected make/frequency pairs.  

 

Also, you don't need all the sorting procedures in your alternative code.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Watts
SAS Employee

You can use the MAXLEVELS= option, together with ORDER=FREQ, to display the categories with the largest frequencies (most products). For example,

proc freq  data=Amazon1.Products_mod  order=freq;
	tables category / maxlevels=5;
run;
kk131
Fluorite | Level 6

Thank you so much this worked! I am also facing another issue if you could advise.

 

I have imported the excel sheet that i have, which consists of these fields . Here the discounted price shows as 176.63

kk131_0-1684477404124.png

However, in SAS it is different for some values - for example the same value is showing as Rs. 177663.00. It has not taken into account the 176.63. 

kk131_1-1684477460687.png

This is is the code i used - which includes compressing it to remove the rupee sign, changing it from char to num. 

data Amazon.Products_mod;
set Amazon.Products_Nodup;

A_Price= input(compress(actual_price, , 'kd'), comma10.);
put "Actual Price (numeric):" A_Price comma10.;
format A_Price currency_fmt.;

D_Price= input(compress(discounted_price, , 'kd'), comma12.);
put "Discounted Price (numeric):" D_Price comma12.;
format D_Price currency_fmt.;

rating_count_cmp = compress(rating_count);
rating_count_new = input(rating_count_cmp, comma12.);

* Dropping irrelevant values or old values that have been converted and replaced;
drop discounted_price rating_count_cmp rating_count img_link product_link actual_price;
run;

* including the INR sign;
proc format;
picture currency_fmt (round)
low-high = '₹' 00000009.00 (prefix='₹');
run;

Any suggestion on what can be done here and how i can get the right number + a comma for ease of readability. 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1319 views
  • 1 like
  • 4 in conversation