Hi All,
I am using the frequency count for one of my columns called segments. This column ideally should only have 3 types of segments (Consumer, Corporate and Home Office), however when I used the My Task >Task > Statistics > One-Way Frequencies my results are also including some stray data from the next column like in the image below. Is this a bug or am i doing something wrong?
proc freq data=WORK.APACSTOREUPDATED;
tables Segment / plots=(freqplot cumfreqplot);
run;
above is the code used for this. I have also included the raw data, which is the superstore data we usually use for tableau.
It took some time to wrap my head around this one but now I think I figured out what is going on. Apparently your raw data has few non-standard characters that sas is having trouble reading .The next column is getting affected whenever it encounters a funky character.
Example :
Peter Bühler
You need to change the encoding while reading the data and see if this works, this works for me . Specify the encoding in the filename.
filename tmp_ '/folders/myfolders/sasuser.v94/test123.csv' ENCODING="WLATIN1";
proc import datafile=tmp_
out=test123
dbms=csv
replace;
getnames=yes;
run;
proc freq data=test123;
tables Segment / missing;
run;
Result :
Segment Frequency Percent Cumulative Consumer 5658 51.75 5658 51.75 Corporate 3264 29.85 8922 81.61 Home Office 2011 18.39 10933 100.00
There is no possibility of straying into other columns while generating the frequency other than the column specified. Check to see if the raw dataset you have attached and the permanent data you have used are the same.
For further validation try this step on your data and observe the log :
data _null_;
set WORK.APACSTOREUPDATED;
if segment in ('Jieyang','Orange','Shenzen','Toowoomba') then
put 'Segment=' segment;
run;
Hello r_behata,
Thank you for the prompt reply. I have double and triple checked on the data source, even tried filtering in excel itself along with tableau and only in SAS I am seeing those additional values. I have even included the raw data so that the same result could be replicated. Also, I am quite new to SAS so I am not entirely sure if I need to do some pre-processing on these data before working on them.
Is the source of your SAS Dataset the Same to the csv file you are referring ? It appears to me that they are not the same.
Change the path in the code snippet, import the csv file and run the frequency on the imported data and see if you get the same result.
proc import datafile="C:\temp\test123.csv" out=test123 dbms=csv replace; getnames=yes; run;
Sorry, I just renamed the file before sending it out earlier. The image below is based on the CSV file (test123) that i've uploaded. the result is still the same... 😞
/*
*
* Task code generated by SAS Studio 3.8
*
* Generated on '12/1/19, 1:48 PM'
* Generated by 'u44588303'
* Generated on server 'ODAWS04-USW2.ODA.SAS.COM'
* Generated on SAS platform 'Linux LIN X64 3.10.0-1062.4.3.el7.x86_64'
* Generated on SAS version '9.04.01M6P11072018'
* Generated on browser 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
* Generated on web client 'https://odamid.oda.sas.com/SASStudio/main?locale=en_MY&zone=GMT%252B08%253A00&https%3A%2F%2Fodamid.oda.sas.com%2FSASStudio%2F%3Bc6db82655ce99dc331e2a3a205f631cc_Cluster2=1CD749515D9C70EAE22184785DC751EC.odamid01_SASServer2_1'
*
*/
proc freq data=WEBWORK.TEST123;
tables Segment / plots=(freqplot cumfreqplot);
run;
It took some time to wrap my head around this one but now I think I figured out what is going on. Apparently your raw data has few non-standard characters that sas is having trouble reading .The next column is getting affected whenever it encounters a funky character.
Example :
Peter Bühler
You need to change the encoding while reading the data and see if this works, this works for me . Specify the encoding in the filename.
filename tmp_ '/folders/myfolders/sasuser.v94/test123.csv' ENCODING="WLATIN1";
proc import datafile=tmp_
out=test123
dbms=csv
replace;
getnames=yes;
run;
proc freq data=test123;
tables Segment / missing;
run;
Result :
Segment Frequency Percent Cumulative Consumer 5658 51.75 5658 51.75 Corporate 3264 29.85 8922 81.61 Home Office 2011 18.39 10933 100.00
Thank You very much. You are a lifesaver. I did not realize that the formatting would cause an issue like this!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.