Hi all,
I created 3 do loops for the purpose of being able to cycle through 3 different types of variables (dependent, categorical, and continuous). When I run the loop it works, but I noticed that the number of observations went from 950 to 38,000 and 94 to 101 variables. I believe I should have only created 3 new variables. Any ideas why this is happening? I believe the extra observations are throwing off the results for my subsequent functions.
data perm.temp;
set perm.mentalhealth;
array var_list[9] trouble_sleeping hurting_yourself interest depressed little_energy appetite feeling_bad
concentrating moving_slowly;
do i=1 to dim(var_list);
VarName=vname(var_list(i));
Outcome=var_list[i];
output;
end;
array categorical[22] gender age scalp_lesions postauricular erythema eyelid_involvement cheilitis flexural_erythema
xerosis neck_folds nipple_eczema keratosis palmar hand_eczema ichthyosis foot_eczema race education_final insurance
alopecia pityriasis pain_severeB;
do j=1 to dim(categorical);
categorical_=vvalue(categorical(j));
output;
end;
array npredictors[9] SCORAD EASI BSA ADSI POEM_SCORE dlqi_score FIVED_SCORE RL_SCORE flare;
do k=1 to dim(npredictors);
npredictors_=vname(npredictors(k));
output;
end;
format Depression Depressionn. Anxiety Anxietyy. interest interestt. depressed _depressedd. trouble_sleeping trouble_sleepingg.
little_energy little_energyy. appetite appetitee. feeling_bad feeling_badd. concentrating concentratingg.
moving_slowly moving_slowlyy. hurting_yourself hurting_yourselff. PHQ9_SCORE PHQ9_SCORE_. PHQ2_SCORE PHQ2_SCORE.
gender gender. race race. education_final education. insurance insurance. scalp_lesions scalp_lesionss. postauricular postauricularr.
erythema erythemaa. eyelid_involvement eyelid_involvementt. cheilitis cheilitiss. flexural_erythema flexural_erythemaa.
xerosis xerosiss. neck_folds neck_foldss. nipple_eczema nipple_eczemaa. keratosis keratosiss. palmar palmarr.
hand_eczema hand_eczemaa. ichthyosis ichthyosiss. foot_eczema foot_eczemaa. age age_bin_. alopecia alopeciaa.
pityriasis pityriasiss. pain_severeB painn. ;
run;
So first, I don't know the answer to your question, because I find your code difficult to read and it would help us all if your code was properly indented and formatted, and lines didn't extend beyond the right edge of the text box. Example of proper indentation and formatting:
data perm.temp;
set perm.mentalhealth;
array var_list[9] trouble_sleeping hurting_yourself interest depressed little_energy appetite feeling_bad
concentrating moving_slowly;
do i=1 to dim(var_list);
VarName=vname(var_list(i));
Outcome=var_list[i];
output;
end;
So, I am asking you to take the time and re-format your code such that it is more readable, and then I would be happy to try and figure out what you are doing wrong and what the fix is.
Not only will people here in the SAS Communities benefit from properly indented and formatted code, but in the long run you also will benefit from this, as errors are much more easily diagnosed.
data perm.temp;
set perm.mentalhealth;
array var_list[9] trouble_sleeping hurting_yourself interest depressed little_energy appetite feeling_bad
concentrating moving_slowly;
do i=1 to dim(var_list);
VarName=vname(var_list(i));
Outcome=var_list[i];
output;
end;
array categorical[22] gender age scalp_lesions postauricular erythema eyelid_involvement cheilitis flexural_erythema
xerosis neck_folds nipple_eczema keratosis palmar hand_eczema ichthyosis foot_eczema race education_final insurance
alopecia pityriasis pain_severeB;
do j=1 to dim(categorical);
categorical_=vvalue(categorical(j));
output;
end;
array npredictors[9] SCORAD EASI BSA ADSI POEM_SCORE dlqi_score FIVED_SCORE RL_SCORE flare;
do k=1 to dim(npredictors);
npredictors_=vname(npredictors(k));
output;
end;
format Depression Depressionn. Anxiety Anxietyy. interest interestt. depressed _depressedd. trouble_sleeping trouble_sleepingg.
little_energy little_energyy. appetite appetitee. feeling_bad feeling_badd. concentrating concentratingg.
moving_slowly moving_slowlyy. hurting_yourself hurting_yourselff. PHQ9_SCORE PHQ9_SCORE_. PHQ2_SCORE PHQ2_SCORE.
gender gender. race race. education_final education. insurance insurance. scalp_lesions scalp_lesionss. postauricular postauricularr.
erythema erythemaa. eyelid_involvement eyelid_involvementt. cheilitis cheilitiss. flexural_erythema flexural_erythemaa.
xerosis xerosiss. neck_folds neck_foldss. nipple_eczema nipple_eczemaa. keratosis keratosiss. palmar palmarr.
hand_eczema hand_eczemaa. ichthyosis ichthyosiss. foot_eczema foot_eczemaa. age age_bin_. alopecia alopeciaa.
pityriasis pityriasiss. pain_severeB painn. ;
run;
Sorry about that. Does this help?
@393310 wrote:
data perm.temp; set perm.mentalhealth; array var_list[9] trouble_sleeping hurting_yourself interest depressed little_energy appetite feeling_bad
concentrating moving_slowly; do i=1 to dim(var_list); VarName=vname(var_list(i)); Outcome=var_list[i]; output; end; array categorical[22] gender age scalp_lesions postauricular erythema eyelid_involvement cheilitis flexural_erythema
xerosis neck_folds nipple_eczema keratosis palmar hand_eczema ichthyosis foot_eczema race education_final insurance
alopecia pityriasis pain_severeB; do j=1 to dim(categorical); categorical_=vvalue(categorical(j)); output;
end; array npredictors[9] SCORAD EASI BSA ADSI POEM_SCORE dlqi_score FIVED_SCORE RL_SCORE flare; do k=1 to dim(npredictors); npredictors_=vname(npredictors(k)); output;
end; format Depression Depressionn. Anxiety Anxietyy. interest interestt. depressed _depressedd. trouble_sleeping trouble_sleepingg.
little_energy little_energyy. appetite appetitee. feeling_bad feeling_badd. concentrating concentratingg.
moving_slowly moving_slowlyy. hurting_yourself hurting_yourselff. PHQ9_SCORE PHQ9_SCORE_. PHQ2_SCORE PHQ2_SCORE.
gender gender. race race. education_final education. insurance insurance. scalp_lesions scalp_lesionss. postauricular postauricularr.
erythema erythemaa. eyelid_involvement eyelid_involvementt. cheilitis cheilitiss. flexural_erythema flexural_erythemaa.
xerosis xerosiss. neck_folds neck_foldss. nipple_eczema nipple_eczemaa. keratosis keratosiss. palmar palmarr.
hand_eczema hand_eczemaa. ichthyosis ichthyosiss. foot_eczema foot_eczemaa. age age_bin_. alopecia alopeciaa.
pityriasis pityriasiss. pain_severeB painn. ; run;Sorry about that. Does this help?
The point of formatting and indenting is to make things VISUALLY obvious. So I prefer to see four spaces — or one tab — rather then one space when you indent. And if it is a second indent, then it would be eight spaces — or two tabs — when you indent. Your use of one space isn't particularly visually obvious to me (and I wonder if it is visually obvious to you). I would also want to see the END command left aligned with the corresponding DO command.
So no, that doesn't help.
Again, my point is that YOU will benefit from putting a little extra effort in to formatting your code well, this makes coding problems more easy to identify. Such formatting also benefits YOU because those of us here in the SAS Communities will find your code easier to debug.
Anyway, I think @Tom has identified the problem correctly.
If you want the new dataset to only have some of the variables then use a KEEP statement to list the variables you want. Or use a DROP statement to list the variables you don't want.
But I don't understand what structure you are trying to create.
If you want to convert wide to tall then go all the way. Your new dataset probably needs just a few variables (but not the ones your code is making).
1) Some type of ID that indicates the original observation. If your data does have one (or more) variable that uniquely idenitify the rows then create one.
2) A variable to indicate which of the three types.
3) A variable to include the variable name
4) A variable to include the value. Since you are mixing types this will need to a character variable.
What do you think the OUTPUT statement does?
What effect do you think executing an OUTPUT statement every time through a DO loop would have on the number of observations written?
You have three arrays, with 9, 22 and 9 members. You loop over all three arrays separately, which merans that you have 40 loops done overall. Since every DO loop contains an OUTPUT, you get 40 observations in the new dataset for every observation in the old dataset.
You create these new variables:
i, j, k (loop indexes)
varname, outcome (coded in the first loop)
categorical_ (in the second loop)
npredictors_(in the third loop)
This are 7 additional variables. 94 + 7 = 101.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.