I am a new sas user and currently working on an assignment. I have some raw data which includes product categories, rating, no. of ratings and prices. I have used the proc freq to show no. of products within each category and the % contribution. I have also sorted the data as per the number of products within each category. However, I only want to show the top 5 data and now the entire table, since some categories have less than 10 products which may not be relevant. I have currently used the below code and also tried using obs = 5 etc but that has not worked, the output was not sorted and picked the top 5 results. Alternatively, I have tried using another code that only displays the top most category and not the rest. Any suggestions on what I can do? Sharing the two codes and the screenshot of the O/P i am receiving.
proc freq data= Amazon1.Products_mod
order = freq;
proc sort data= amazon1.products_mod;
/* Sort the dataset by category */
proc sort data=amazon1.products_mod out=sorted_products;
/* Calculate the frequency of each category */
proc freq data=sorted_products;
tables category / out=freq_table;
/* Sort the frequency table in descending order of frequency */
proc sort data=freq_table;
by descending count;
/* Print the first 5 rows of the frequency table */
proc print data=freq_table(obs=5);
var category count;
Any help will be appreciated! Thank you 🙂
I think that you need to provide a bit more details.
For instance you mention: " product categories, rating, no. of ratings and prices" but do not describe anywhere quite how anything other than Category is involved. And there seems to be an implied use of "product" within categories but you don't mention that at all other than to say that you "sorted the data as per the number of products within each category".
Best would be to provide a small example of some dummy data with maybe 5 categories and a number of values for the other variables. The sample should be small enough to show what you expect for output given that example data. Then show that desired result. The categories don't even have to be very entertaining, single letters or even numerals would do.
The example is best provided in the form of data step code. Something like this:
data have; input category $ rating NumberOfRatings price; datalines; a 4 25 15.55 b 3.5 16 12.01 ;
If the actual products are import as well then you need to add a Product variable. That might look like:
data have; input category $ rating NumberOfRatings price product $; datalines; a 4 25 15.55 zz b 3.5 16 12.01 yy ;
The above code will only allow values of Category and Product to have 8 characters and no spaces.
Note that there is no inherent "top 5" of anything. There details needed to define "top". A "top" score might be the lowest numeric value , as in 1, 2, 3 for first, second, third. So you need to provide some rules as to define what is used to determine the ranking order AND how to calculate the rank (might be largest count or something else).
From looking at that proc print output you may need to redesign your data depending on what you expect for analysis. The values look like menu level choices.
Here is code for dataset sashelp.cars, which I think emulates what you want (i.e. a print of the category ("make" in this example) and frequency of the most frequent makes, in descending order.
proc freq data=sashelp.cars order=freq; table make / out=result; run; proc print data=result (obs=5); var make count; run;
Note you could add the option NOPRINT to the proc freq statement if you are not interested in seeing all the frequencies. The "out=" option generates a dataset of frequencies, from which you can print out selected make/frequency pairs.
Also, you don't need all the sorting procedures in your alternative code.
Thank you so much this worked! I am also facing another issue if you could advise.
I have imported the excel sheet that i have, which consists of these fields . Here the discounted price shows as 176.63
However, in SAS it is different for some values - for example the same value is showing as Rs. 177663.00. It has not taken into account the 176.63.
This is is the code i used - which includes compressing it to remove the rupee sign, changing it from char to num.
A_Price= input(compress(actual_price, , 'kd'), comma10.);
put "Actual Price (numeric):" A_Price comma10.;
format A_Price currency_fmt.;
D_Price= input(compress(discounted_price, , 'kd'), comma12.);
put "Discounted Price (numeric):" D_Price comma12.;
format D_Price currency_fmt.;
rating_count_cmp = compress(rating_count);
rating_count_new = input(rating_count_cmp, comma12.);
* Dropping irrelevant values or old values that have been converted and replaced;
drop discounted_price rating_count_cmp rating_count img_link product_link actual_price;
* including the INR sign;
picture currency_fmt (round)
low-high = '₹' 00000009.00 (prefix='₹');
Any suggestion on what can be done here and how i can get the right number + a comma for ease of readability.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
For SAS newbies, this video is a great way to get started. James Harroun walks through the process using SAS Studio for SAS OnDemand for Academics, but the same steps apply to any analytics project.
Find more tutorials on the SAS Users YouTube channel.