BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kumarathevan
Fluorite | Level 6

Hi All, 

 

I am using the frequency count for one of my columns called segments. This column ideally should only have 3 types of segments (Consumer, Corporate and Home Office), however when I used the My Task >Task > Statistics > One-Way Frequencies my results are also including some stray data from the next column like in the image below. Is this a bug or am i doing something wrong? 

 

Frequency Count Segments.png

 

proc freq data=WORK.APACSTOREUPDATED;
	tables Segment / plots=(freqplot cumfreqplot);
run;

above is the code used for this. I have also included the raw data, which is the superstore data we usually use for tableau. 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
r_behata
Barite | Level 11

It took some time to wrap my head around this one but  now I think I figured out what is going on. Apparently your raw data has few non-standard characters that sas is having trouble reading .The next column is getting affected  whenever it encounters a funky character.

 

Example :

Peter Bühler

 

You need to change the encoding while reading the data and see if this works, this works for me . Specify the encoding in the filename.

 



filename tmp_ '/folders/myfolders/sasuser.v94/test123.csv'  ENCODING="WLATIN1";

proc import datafile=tmp_
        out=test123
        dbms=csv
        replace;
         getnames=yes;
run;

proc freq data=test123;
	tables Segment / missing;
run;

 

Result :

 

Segment	Frequency	Percent	Cumulative
Consumer	5658	51.75	5658	51.75
Corporate	3264	29.85	8922	81.61
Home Office	2011	18.39	10933	100.00

 

View solution in original post

6 REPLIES 6
r_behata
Barite | Level 11

There is no possibility  of straying into other columns while generating the frequency other than the column specified. Check to see if the raw dataset  you have attached and the permanent data you have used are the same.

 

For further validation try this step on your data and observe the log :

 

 

 

data _null_;
set WORK.APACSTOREUPDATED;
if segment in ('Jieyang','Orange','Shenzen','Toowoomba') then 
	put 'Segment=' segment;
run;

 

 

Kumarathevan
Fluorite | Level 6

Hello r_behata, 

 

Thank you for the prompt reply. I have double and triple checked on the data source, even tried filtering in excel itself along with tableau and only in SAS I am seeing those additional values. I have even included the raw data so that the same result could be replicated. Also, I am quite new to SAS so I am not entirely sure if I need to do some pre-processing on these data before working on them. 

r_behata
Barite | Level 11

Is the source of your SAS Dataset  the Same to the csv file you are referring ? It appears to me that they are not the same. 

 

Change the path in the code snippet, import the csv file and run the frequency on the imported data and see if you get the same result.

 

proc import datafile="C:\temp\test123.csv"
        out=test123
        dbms=csv
        replace;
         getnames=yes;
run;
Kumarathevan
Fluorite | Level 6

Sorry, I just renamed the file before sending it out earlier. The image below is based on the CSV file (test123) that i've uploaded. the result is still the same... 😞

 

test123Segments.png

/*
 *
 * Task code generated by SAS Studio 3.8 
 *
 * Generated on '12/1/19, 1:48 PM' 
 * Generated by 'u44588303' 
 * Generated on server 'ODAWS04-USW2.ODA.SAS.COM' 
 * Generated on SAS platform 'Linux LIN X64 3.10.0-1062.4.3.el7.x86_64' 
 * Generated on SAS version '9.04.01M6P11072018' 
 * Generated on browser 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36' 
 * Generated on web client 'https://odamid.oda.sas.com/SASStudio/main?locale=en_MY&zone=GMT%252B08%253A00&https%3A%2F%2Fodamid.oda.sas.com%2FSASStudio%2F%3Bc6db82655ce99dc331e2a3a205f631cc_Cluster2=1CD749515D9C70EAE22184785DC751EC.odamid01_SASServer2_1' 
 *
 */

proc freq data=WEBWORK.TEST123;
	tables Segment / plots=(freqplot cumfreqplot);
run;
r_behata
Barite | Level 11

It took some time to wrap my head around this one but  now I think I figured out what is going on. Apparently your raw data has few non-standard characters that sas is having trouble reading .The next column is getting affected  whenever it encounters a funky character.

 

Example :

Peter Bühler

 

You need to change the encoding while reading the data and see if this works, this works for me . Specify the encoding in the filename.

 



filename tmp_ '/folders/myfolders/sasuser.v94/test123.csv'  ENCODING="WLATIN1";

proc import datafile=tmp_
        out=test123
        dbms=csv
        replace;
         getnames=yes;
run;

proc freq data=test123;
	tables Segment / missing;
run;

 

Result :

 

Segment	Frequency	Percent	Cumulative
Consumer	5658	51.75	5658	51.75
Corporate	3264	29.85	8922	81.61
Home Office	2011	18.39	10933	100.00

 

Kumarathevan
Fluorite | Level 6

Thank You very much. You are a lifesaver. I did not realize that the formatting would cause an issue like this!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2334 views
  • 1 like
  • 2 in conversation