BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
smart
Fluorite | Level 6

Here is my code so far:

libname mtsu '/courses/ddb976e5ba27fe300/PSY6560/';
run;

data cleansurvey2011;
set mtsu.survey2011;
array dem [8];
array anx [50];
array dep [18];
do i = 1 to 8;
do i = 8 to 50;
do i = 50 to 68;
if dem[i] = . then dem[i] = (missing data);
if anx[i] = . then anx[i] = (missing data);
if dep[i]= . then dep[i] = (missing data);
depreversed = dep18+1 - dep1;
end;
end;
end;
avg = anx;
avg = dep;
if Dem1 = 0 then Feedback = 'Male';
else if Dem1 = 1 then Feedback = 'Female';
if Dem2 = 0 then Feedback = 'No';
else if Dem2 = 1 then Feedback = 'Yes';
if Dem3 = 0 then Feedback = 'No';
else if Dem3 = 1 then Feedback = 'Yes';
if Dem4 = 0 then Feedback = 'No';
else if Dem4 = 1 then Feedback = 'Yes';
if Dem5 = 0 then Feedback = 'No';
else if Dem5 = 1 then Feedback = 'Yes';
if Dem6 = 0 then Feedback = 'No';
else if Dem6 = 1 then Feedback = 'Yes';
if Dem7 = 0 then Feedback = 'No';
else if Dem7 = 1 then Feedback = 'Yes';
if Dem8 = 0 then Feedback = 'No';
else if Dem8 = 1 then Feedback = 'Yes';
proc contents data = mtsu.survey2011;
run;
proc freq data = mtsu.survey2011;
tables dem;
run;
proc print data = cleansurvey2011;
var anx dep;
run;

 

 

Here is what I am trying to do:

A SAS data set called survey2011is stored in the '/courses/ddb976e5ba27fe300/PSY6560/' folder in SAS Studio.

 

Creates a temporary data set called cleansurvey2011. The cleansurvey2011 data set should be based on the survey2011data set, but should have the following additional characteristics.

 

  • The possible values for the demographic questions are 0 and 1. Any other value should be recoded as missing data. Consider using the notin( ) function.

 

  • The possible values for the anxiety questions are 1, 2, and 3. Any other value should be recoded as missing data.

 

  • The possible values for the depression questions are 1, 2, 3, 4, 5, 6, and 7. Any other value should be recoded as missing data.

 

  • Reverse score the following depression questions: 3, 4, 8, 9, 10, and 11. A quick way to reverse code a variable is: reversed = scalemax+1  – original. For example,

 

anxiety1r = 4 – anxiety1;

 

  • Create an average anxiety score for each person.

 

  • Create an average depression score for each person.

 

  • Assigns permanent labels and formats to each demographic variable.

 

Variable Name

 

Dem1

Sex (0 = Male, 1 = Female)

Dem2

Do you reside in Rutherford County? (0 = No, 1 = Yes)

Dem3

Are you married? (0 = No, 1 = Yes)

Dem4

Do you have children? (0 = No, 1 = Yes)

Dem5

Do you own your own home? (0 = No, 1 = Yes)

Dem6

Do you own a vehicle? (0 = No, 1 = Yes)

Dem7

Do you work full-time? (0 = No, 1 = Yes)

Dem8

Are you currently enrolled at MTSU? (0 = No, 1 = Yes)

 

  • Print descriptive statistics for average anxiety and average depression scores.

 

  • Print frequencies for the demographic variables.

 

Here is the Survey2011 Dataset:

 

 Obs dem1 dem2 dem3 dem4 dem5 dem6 dem7 dem8 anx1 anx2 anx3 anx4 anx5 anx6 anx7 anx8 anx9 anx10 anx11 anx12 anx13 anx14 anx15 anx16 anx17 anx18 anx19 anx20 anx21 anx22 anx23 anx24 anx25 anx26 anx27 anx28 anx29 anx30 anx31 anx32 anx33 anx34 anx35 anx36 anx37 anx38 anx39 anx40 anx41 anx42 anx43 anx44 anx45 anx46 anx47 anx48 anx49 anx50 dep1 dep2 dep3 dep4 dep5 dep6 dep7 dep8 dep9 dep10 dep11 dep12 dep13 dep14 dep15 dep16123456789101112131415

00101110123202231332223132212223231233122232313222223222123455324464645564
11100010233222312123122311132321321322212103111221223130225516457254535567
10200111011322231323323230233223323333332123331322132322222652764534365584
01111111312221221122231312221330121322032221223312212312322425544656545374
10010000231322231122322333223033112222322311101222223123115766344663753535
01000011122221223322323231123222311132223212221323323232234676435742173465
21110110202113131233231133312313321332313323232122223122323642544656463446
01012110212112322323332121212232122232312332322333221313324355663456235466
00100020202233122131232221212322132322212233233012213322133641462833353785
10111011232323223323233232322321313121233323330222223132315645456666452454
00200111232323323233321332330123222221222231223223223222336535725445554225
00010001223321211022111133333311223211133322322322213330222247524544236254
00111101222313232323222233123133131332333313302121223322222253543253426463
01000111211021122222132333233123223233222131233113323231316636667666633353
01000101332131233322122121322132101322322122232233313212225623463655157344
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Ok. There's a lot here that you need to understand and do but the previous link I supplied regarding arrays is pretty clear and covers a lot of what you're trying to do. 

 

I would highly suggest breaking this down into steps so you can do one thing at a time and make sure that works first.

Let's start with these two.

1. Ensure values have ONLY the values they need to 

2. Apply formats 

3. Apply labels

 

For #1 if you want to check if a value has the value you expect you can use the NOT IN () operator as indicated in your question. 

You should be able to build on this code and get about half way there. Not sure if the misspelling of not in() is your typo or your professors/testor. 

Post back what you have when you're closer to the other parts and what you've tried for those and I'll take another look. 

 

proc format;
value sex_fmt
0 = 'Male'
1 = 'Female';

value YesNo_fmt
0 = 'No'
1 = 'Yes';

run;

data cleanSurvey2011;
set mtsu.Survey2011;

array dem(8) dem1-dem8;

*loop through;
do i=1 to dim(dem);
if dem(i) not in (0, 1) /*list of valid values*/ 
       then dem(i) = .; *set to missing;
end;

*repeat for each set of variables;

*apply formats;
format dem1 sex_fmt. dem2-dem8 YesNo_fmt.;

*create labels;
label dem1 = 'Sex' 
         dem2 = 'Do you reside in Rutherford County'; 

run;

*check results, print first 5 records;
proc print data= cleanSurvey2011 (obs=5) labels noobs;
run;

 

View solution in original post

3 REPLIES 3
smart
Fluorite | Level 6

Here's what I think I'm doing:

 

libname mtsu '/courses/ddb976e5ba27fe300/PSY6560/';
run;

/* Creates a temporary data set called cleansurvey2011. */
/* The cleansurvey2011 data set should be based on the survey2011 data set, */
/* but should have the following additional characteristics. */
data cleansurvey2011;
set mtsu.survey2011;
/* The possible values for the demographic questions are 0 and 1. */
/* Any other value should be recoded as missing data. Note. */
/* Consider using the notin( ) function. */
array dem [8];
do i = 1 to 8;
if dem[i] = . then dem[i] = (missing data);
/* The possible values for the anxiety questions are 1, 2, and 3. */
/* Any other value should be recoded as missing data. */
array anx [50];
do i = 8 to 50;
if anx[i] = . then anx[i] = (missing data);
/* The possible values for the depression questions are 1, 2, 3, 4, 5, 6, and 7. */
/* Any other value should be recoded as missing data. */
array dep [18];
do i = 50 to 68;
if dep[i]= . then dep[i] = (missing data);
/* Reverse score the following depression questions: 3, 4, 8, 9, 10, and 11. A quick way to reverse code a variable is: */
/* reversed = scalemax+1 – original */
depreversed = dep18+1 - dep1;
end;
end;
end;
/* Create an average anxiety score for each person. */
avg = anx;

/* Create an average depression score for each person. */
avg = dep;
/* Assigns permanent labels and formats to each demographic variable. */
if Dem1 = 0 then Feedback = 'Male';
else if Dem1 = 1 then Feedback = 'Female';
if Dem2 = 0 then Feedback = 'No';
else if Dem2 = 1 then Feedback = 'Yes';
if Dem3 = 0 then Feedback = 'No';
else if Dem3 = 1 then Feedback = 'Yes';
if Dem4 = 0 then Feedback = 'No';
else if Dem4 = 1 then Feedback = 'Yes';
if Dem5 = 0 then Feedback = 'No';
else if Dem5 = 1 then Feedback = 'Yes';
if Dem6 = 0 then Feedback = 'No';
else if Dem6 = 1 then Feedback = 'Yes';
if Dem7 = 0 then Feedback = 'No';
else if Dem7 = 1 then Feedback = 'Yes';
if Dem8 = 0 then Feedback = 'No';
else if Dem8 = 1 then Feedback = 'Yes';

/* Print descriptive statistics for average anxiety and average depression scores. */
/* */
/* Print frequencies for the demographic variables. */
/* */
proc contents data = mtsu.survey2011;
run;
proc freq data = mtsu.survey2011;
tables dem;
run;
proc print data = cleansurvey2011;
var anx dep;
run;

Reeza
Super User

Ok. There's a lot here that you need to understand and do but the previous link I supplied regarding arrays is pretty clear and covers a lot of what you're trying to do. 

 

I would highly suggest breaking this down into steps so you can do one thing at a time and make sure that works first.

Let's start with these two.

1. Ensure values have ONLY the values they need to 

2. Apply formats 

3. Apply labels

 

For #1 if you want to check if a value has the value you expect you can use the NOT IN () operator as indicated in your question. 

You should be able to build on this code and get about half way there. Not sure if the misspelling of not in() is your typo or your professors/testor. 

Post back what you have when you're closer to the other parts and what you've tried for those and I'll take another look. 

 

proc format;
value sex_fmt
0 = 'Male'
1 = 'Female';

value YesNo_fmt
0 = 'No'
1 = 'Yes';

run;

data cleanSurvey2011;
set mtsu.Survey2011;

array dem(8) dem1-dem8;

*loop through;
do i=1 to dim(dem);
if dem(i) not in (0, 1) /*list of valid values*/ 
       then dem(i) = .; *set to missing;
end;

*repeat for each set of variables;

*apply formats;
format dem1 sex_fmt. dem2-dem8 YesNo_fmt.;

*create labels;
label dem1 = 'Sex' 
         dem2 = 'Do you reside in Rutherford County'; 

run;

*check results, print first 5 records;
proc print data= cleanSurvey2011 (obs=5) labels noobs;
run;

 

ballardw
Super User

This is double-plus-ungood:

do i = 1 to 8;
do i = 8 to 50;
do i = 50 to 68;

Except under very unusual situations (I can't think of one right now) each loop should have its own index variable

 

Perhaps

do i = 1 to 8;
do j = 1 to 50;
do k = 1 to 18;

What your three statements did was

 

1) set I to 1

then immediately set I to 8, then again to 50.

Since the value of I=50 the arrays dem dep cause errors with subscripts because they do not have enough elements.

So DEM would use I values, ANX would use J values, which likely should be 1 to 50, and DEP would K values of 1 to 18

 

 

I don't know what you are attempting with

if dem[i] = . then dem[i] = (missing data);

but (missing data) is syntactically incorrect.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 951 views
  • 0 likes
  • 3 in conversation