BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ameliasalem
Obsidian | Level 7

Below is the code that I have started pulling together and I know that I must be making a really simple mistake but I can't spend more time trying to figure out what I'm doing incorrectly so, hoping someone else can spot it for me. 

 

Every time I run lines 19 and onwards, I am not finding a 'workingasr' (line 19 is repeated below sorry about that) in the output data, and the changes I'm making to the data set are not showing up in OUTPUT DATA but they are sort of showing up in the mydata.sas7bdat file which.. I wanted to keep as my original file for reference.. 

 

I want to be editing a 'working' version of that data set. Any tips? Thanks in advance for any support or guidance.

ameliasalem_0-1690864765696.png

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Why do you think it makes any difference in the analysis whether the value are coded 1/2 or 1/0 ?

You should not need to recode the variable.  

 

What analysis are you trying to perform?

What procedure are you using?

 

If you want to look at the definition of the formats that were created then use PROC FORMAT with the FMTLIB option.  Or perhaps make a dataset of the formats by using the CNTLOUT= option.

 

From your picture of the code book it looks like you should recode 8 and 9 to one of the 27 missing values so that are excluded from your analysis automatically.  I don't know if STATA has an SPSS like command that lets you tell that it should treat some otherwise valid values as meaning misisng.  But in SAS it is best to get set the values to missing to begin with and then the procedures will treat them properly automatically.

 

To have the value print "pretty" you might need to also redefine the format or define a new format and use that with the recoded variable.

 

proc format ;
value gender 
  1='1 Male' 2='2 Female' .d='Do not know' .r='Refused'
;
run;

data step1;
  set have;
  gender=qn1f;
  if gender=8 then gender=.d ;
  else if gender=9 then gender=.r ;
  format gender gender.;
run;
proc freq data=step1;
  tables gender / missprint;
run;

View solution in original post

17 REPLIES 17
Kurt_Bremser
Super User

Please do not post pictures of code. Post SAS code as text by copy/pasting it into a code box opened with the "little running man" button. Overtype sensitive information there.

 

Look at the log. It will tell you what went wrong. If in doubt, post the log (all code and messages) by copy/pasting it into a window opened with the </> button.

ameliasalem
Obsidian | Level 7

Got it! My apologies. Let me start over here then. 

1) I brought the file in using this code: 

 

libname mylib '/home/xxxxxxx/sasuser.v94';
proc import out=mylib.mydata datafile='/home/xxxxxxx/2019 ASR_Public_Use_File.dta' dbms=DTA replace;
run;

proc print data=mylib.mydata;
run;

proc freq data=mylib.mydata;
table _all_;
run;
proc means data=mylib.mydata;
run;

2) I try to set up a 'working' version so as not to overwrite the original:

 

 

DATA asr;
set '/home/xxxxxxx/sasuser.v94/mydata.sas7bdat';
run;

3) I run the below code to try to start modifying variables from the 'working' data set and the changes do not appear in OUTPUT DATA in the 'working asr' version:

 

 

data workingasr;
set asr;
if qn1f='male' then qn1f=1;
else if qn1f='female' then qn1f=0;
if respondent='respondent' then respondent=1;
else if respondent='not respondent' then respondent=0;
run;

4) I checked the LOG and it appears to be making changes to the mydata.sas7bdat dataset or at least, reading from it? 

 NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
       71:9    72:14   73:15   74:20   
 NOTE: Invalid numeric data, 'male' , at line 71 column 9.
 NOTE: Invalid numeric data, 'female' , at line 72 column 14.
 NOTE: Invalid numeric data, 'respondent' , at line 73 column 15.
 NOTE: Invalid numeric data, 'not respondent' , at line 74 column 20.

5) So the questions are:

i) What is the presumably simple change that I clearly need to make in my code in order to make modifications to variables from the 'working asr' data set rather than the  mydata.sas7bdat dataset so that the OUTPUT DATA reflects the changes?

 

ii) How can I read the data NOT as numeric so that I can later create dummy variables for gender, marital status, etc.?

NOTE: what's super weird is if I add a line of code to part 3) above that asks to rename one of the columns, it shows up in the OUTPUT DATA for the 'working asr' but NOT in the maydata.sas7bdat file. 

 

Let me know if I'm missing any details you need to offer guidance! Thanks in advance. 

Kurt_Bremser
Super User

When posting logs, always post all code and messages, so we can see the messages in context.

 

I suspect that qn1f is in fact numeric and has a format attached which displays strings like "male" and "female". Same for respondent. Maxim 3 (Know Your Data): run PROC CONTENTS on the dataset.

 

PS your path ending in sasuser.v94 should be available as the SASUSER library, unless the username in the path is someone else's.

ameliasalem
Obsidian | Level 7

Thank you, will do in the future. The log would not have fit because of the contents of this data set, it recorded hundreds of lines of code with details about the variables.. 

 

So, I used PROC CONTENTS as recommended and you are right. In the 'Alphabetic List of Variables and Attributes' table in the RESULTS tab, the 'Type' is listed as Num for all variables in this data set.

ameliasalem_1-1690932381535.png

Based on the FREQ table below, since these are read as 'text/categorical' variables but are assigned numeric values...

1) how can I filter out 'don't know' and 'refused'

 

ameliasalem_0-1690932351570.png

This is the code I used instead to get access to the data set that I was able to run because I had previously imported it. Now I have a 'working' version that I can use. 

proc import out=ASR1 datafile='/home/xxxxxxx/2019 ASR_Public_Use_File.dta';
run;

proc contents data=ASR1;
run;

data asr;
set ASR1;
run;

/*Run PROC CONTENTS to confirm that all are numeric*/
PROC CONTENTS data=ASR1;
run;

/*Check frequency distro and determine if other variables
besides male/female present and need to be filtered out*/
PROC FREQ data=ASR1;
TABLES qn1f;
run;

data asr;
set asr;
rename qn1f=male;
run;

 

 

 

 

 

ballardw
Super User

Regardless of the values or types of the variables this will cause the the conversion message to appear:

if qn1f='male' then qn1f=1;
else if qn1f='female' then qn1f=0;

Reason: if Qn1f is indeed a text variable with the value of 'male' then attempting to assign the numeric value of 1 to that variable will require SAS to convert the 1 to character to do the assignment.

If Qn1f is not in fact a character variable then SAS attempts to convert the text 'male' to a numeric value to compare with the value of the variable, again generating the conversion message, and failing because 'male' is not convertible to numeric with the default SAS tools used.

 

Exact same issue with the other variable.

 

When the data is provided in the form of a SAS data set then it is up to you to examine it, such as with Proc Contents, to understand what was provided as you don't "read" or "input" SAS data sets.

 

Since you are importing STATA data then you need to read and understand this bit from the online documentation of Proc Import and DTA files:

Stata stores value labels within the data file. The value labels are converted to format library entries as they are read with the IMPORT procedure. The name of the format includes its associated variable name modified to meet the requirements of format names. The name of the format is also associated with a variable in the SAS data set.

 

Which means it is quite likely that the STATA variables are numeric and had Value Labels applied to make the text appear as 'male'/'female' when then underlying values were actually 1/0 or similar. SAS automatically converts such to formats and uses them, that is what "the format is also associated with" means above.

So I suspect that you do not need to convert any of these values at all.

You can test that by doing something like this (if you don't quite understand what proc contents tells you)

Proc freq data=asr;
    tables qn1f respondent;
    format qn1f respondent;
run;

The Format statement without a format following the variable names clears the format associated with the variable. When used in a Proc like this that behavior only applies to the specific proc results.

 

ameliasalem
Obsidian | Level 7

Thank you for this - so I did create a frequency table (see below). From the data set's user guide, these are the values assigned to the categories. 

 

male = 1

female = 2

**So I created a new dummy variable read from this column, just in case. 

 

1) how do I filter out the values in the variable that I don't want (i.e. don't know and refused)? 

ameliasalem_0-1690932351570.png

ameliasalem_0-1690933548594.png

 

Tom
Super User Tom
Super User

Why do you think it makes any difference in the analysis whether the value are coded 1/2 or 1/0 ?

You should not need to recode the variable.  

 

What analysis are you trying to perform?

What procedure are you using?

 

If you want to look at the definition of the formats that were created then use PROC FORMAT with the FMTLIB option.  Or perhaps make a dataset of the formats by using the CNTLOUT= option.

 

From your picture of the code book it looks like you should recode 8 and 9 to one of the 27 missing values so that are excluded from your analysis automatically.  I don't know if STATA has an SPSS like command that lets you tell that it should treat some otherwise valid values as meaning misisng.  But in SAS it is best to get set the values to missing to begin with and then the procedures will treat them properly automatically.

 

To have the value print "pretty" you might need to also redefine the format or define a new format and use that with the recoded variable.

 

proc format ;
value gender 
  1='1 Male' 2='2 Female' .d='Do not know' .r='Refused'
;
run;

data step1;
  set have;
  gender=qn1f;
  if gender=8 then gender=.d ;
  else if gender=9 then gender=.r ;
  format gender gender.;
run;
proc freq data=step1;
  tables gender / missprint;
run;
ameliasalem
Obsidian | Level 7

Hi Tom,

Since I am a relatively new user, in my mind, in order to move through a simple OLS regression, I just want to stick with the dummy concept that I am familiar with (0/1). It will help me stay organized. So I used the code below to create a new dummy variable (yay!) but, now I still have 'don't know' and 'refused' so maybe I will try first coding those as you've advised below with .d and .r to see if I can't remove them. I'm going to have to do that with some of the other variables as well (marital status, school, childed..). 

 

Also, anything related to Stata, your guess is as good as mine because at this point, I know as much about Stata as I do about SAS and R. 

data asrworking;
set asr(keep=personid numppl qn1f qn1d qn1c qn2a qn4c qn26h qn31f_months qn1jyear);
run;

data asrworking;
set asrworking;
rename qn1f=sex;
rename qn1d=age;
rename qn1c=married;
rename qn2a=school;
rename qn4c=english;
rename qn26h=childed;
rename qn31f_months=tanf;
rename qn1jyear=arrived;
d14=0; if arrived=2014 then d14=1;
d15=0; if arrived=2015 then d15=1;
d16=0; if arrived=2016 then d16=1;
d17=0; if arrived=2017 then d17=1;
d18=0; if arrived=2018 then d18=1;
male=0; if sex=1 then male=1;
run;

I'll give those a go and will try any other tactics/code that others recommend. 

Tom
Super User Tom
Super User

PROC GLM does OLS.  Just use a CLASS statement and it will create the dummy variables for you.

ameliasalem
Obsidian | Level 7
I am trying to show all work/steps for my own future reference because this is an exercise to allow me to get more familiar with the data set for more complex analyses down the road. I understand there are shortcuts and this is a helpful note (I'll check it out!)
ameliasalem
Obsidian | Level 7

Hi Tom,

Here are the errors and the RESULTS that I get from using the code (I had already pre-labeled sex and did not yet create dummy in this scenario) and it looks like it worked BUT now, how do I remove D and R?:

ameliasalem_0-1690946763320.png

 

 
 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 68         
 69         proc format;
 70         value sex;
 NOTE: Format SEX is already on the library WORK.FORMATS.
 NOTE: Format SEX has been output.
 71         1='1 Male' 2= '2 Female' .d='don t know' .r='refused';
            _
            180
 ERROR 180-322: Statement is not valid or it is used out of proper order.
 
 NOTE: The previous statement has been deleted.
 72         run;
 
 NOTE: PROCEDURE FORMAT used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              285.03k
       OS Memory           26444.00k
       Timestamp           08/02/2023 03:23:48 AM
       Step Count                        313  Switch Count  0
       Page Faults                       0
       Page Reclaims                     20
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           0
       
 NOTE: The SAS System stopped processing this step because of errors.
 73         
 
 
 74         data asrworking;
 75         set asrworking;
 76         if sex=8 then sex=.d;
 77         else if sex=9 then sex=.r;
 78         format sex sex.;
 79         run;
 
 NOTE: There were 4905 observations read from the data set WORK.ASRWORKING.
 NOTE: The data set WORK.ASRWORKING has 4905 observations and 15 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.01 seconds
       memory              2081.00k
       OS Memory           28116.00k
       Timestamp           08/02/2023 03:23:48 AM
       Step Count                        314  Switch Count  2
       Page Faults                       0
       Page Reclaims                     353
       Page Swaps                        0
       Voluntary Context Switches        12
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           1288
       
 
 80         proc freq data=asrworking;
 81         tables sex / missprint;
 82         run;
 
 NOTE: There were 4905 observations read from the data set WORK.ASRWORKING.
 NOTE: PROCEDURE FREQ used (Total process time):
       real time           0.01 seconds
       user cpu time       0.02 seconds
       system cpu time     0.00 seconds
       memory              2150.59k
       OS Memory           27860.00k
       Timestamp           08/02/2023 03:23:48 AM
       Step Count                        315  Switch Count  2
       Page Faults                       0
       Page Reclaims                     257
       Page Swaps                        0
       Voluntary Context Switches        13
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           272
       
 
 83         
 84         
 85         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 95         

 

Tom
Super User Tom
Super User

If you don't want the missing values to print in the PROC FREQ output then remove the MISSPRINT option.

 

If you want to remove the missing values from even being SEEN by PROC PRINT then use a WHERE statement.

where not missing(sex);

You ended the VALUE statement before you completed all of the elements of it by inserting an extra semicolon.

SAS statements end at the semicolon.  End of line characters are just white space to be ignored.  One way to avoid making that mistake (and make it easier to see them when reviewing the code) is to get in the habit of placing the semicolon on its own line when you have multiple line statement.  Like how you place the END statement on its own line when you write a DO/END block.

value sex
  1='1 Male'
  2= '2 Female' 
  .d="don't know" 
  .r='refused'
;
ameliasalem
Obsidian | Level 7
I appreciate you trying to help but, the variable still appears as a variable with four possible values, rather than a dummy variable with two values (0,1). Thanks for trying!
Tom
Super User Tom
Super User

@ameliasalem wrote:
I appreciate you trying to help but, the variable still appears as a variable with four possible values, rather than a dummy variable with two values (0,1). Thanks for trying!

Of course there are 4 values.  But if you make two of them missing values then they will be ignored by any valid analysis code.  You cannot do a regression with a missing value.

 

Making Boolean variables is trivial in SAS because SAS will evaluate a Boolean expression as either 1 for TRUE or 0 for FALSE.   So for GENDER if you want to make a variable that was 1 when the GENDER was FEMALE then you could just do:

female = (gender='FEMALE') ;

Of if GENDER is a numeric variable with 2 meaning FEMALE you would code that as:

female = (gender=2) ;

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 2132 views
  • 8 likes
  • 4 in conversation