BookmarkSubscribeRSS Feed
rhaley1821
Obsidian | Level 7

Hi all,

 

I am a new sas user trying to clean up a dataset. I am interested in coding some categorical variables into a composite variable of water and sanitation quality. 

 

I have a dataset from nicaragua where responses have accents. It was a previous SPSS file, and when I import to SAS all accent characters convert to unknown characters and prevent me from running a proc freq. I simply need 9 categories to code as 0/1, so the variable could simply be converted to numeric if I know what the values mean. Can someone please advise on how to get rid of the unknown character? The variable in question is S1P15

 

dataset is attached. Current proc import below: 

*import spss dataset and convert;
proc import datafile = "/folders/myfolders/sasuser.v94/WFP/datasets/EMNV14-02 DATOS DE LA VIVIENDA Y EL HOGAR (1).SAV"
out= work.nicaragua
dbms=sav
replace;
run;

 

Thank you! 

2 REPLIES 2
japelin
Rhodochrosite | Level 12

try this code.

It's not perfect, but I think it will be possible to proc freq with categorical variables.

filename imp "/folders/myfolders/sasuser.v94/WFP/datasets/EMNV14-02 DATOS DE LA VIVIENDA Y EL HOGAR (1).SAV" encoding='utf-8';
proc import datafile = imp
  out= work.nicaragua
  dbms=sav
  replace;
run;
Tom
Super User Tom
Super User

Your example dataset only has numeric variables.  So the dataset should work fine.

But the formats might be generated using the original encoding instead of the encoding of your SAS session.

Here is method to convert the format text from WLATIN1 to UTF-8.

First import the SAV file and tell it to build the format catalog.

proc import datafile = "c:\downloads\spss.sav"
  dbms=sav
  out= work.nicaragua replace
;
  fmtlib=work.nicaragua;
run;

Then convert the format catalog to a dataset.  And change the values of the LABEL variable from WLATIN1 to UTF-8 encoding.  Get rid of the MIN/MAX/DEFAULT/LENGTH variables so that PROC FORMAT will recalculate the default length to use based on the adjusted label values.

proc format lib=work.nicaragua out=formats; run;
data formats;
  length label $200;
  set formats ;
  label=kcvt(label,'wlatin1','utf-8');
  keep fmtname start end label;
run;
proc format lib=work.nicaragua cntlin=formats ; run;

Now let's try using the labels. If you didn't write the formats into the WORK.FORMATS catalog then make sure to add the catalog to the FMTSEARCH option.

options insert=(fmtsearch=(work.nicaragua));
proc freq data=nicaragua;
 tables S1P25 ;
run;

Results:

image.png

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 918 views
  • 2 likes
  • 3 in conversation