BookmarkSubscribeRSS Feed
LisaYIN9309
Obsidian | Level 7

Hello,

 

I am working on a survey dataset and I come across this problem to count "Which is the last question a person stopped answer?"

 

My input data looks like:

- the 1st row, outdoor, indoor is NOT part of the dataset, just put in here to illustrate what I am hoping to do:

- Each row is a person, each column is a question (there were 300+ questions)

- And missing values are different for each question, but each group do have same pattern (e.g., for outdoor, 9 is missing, for indoor, 99 is missing, something like this)

 OutdoorIndoor
Person_IDQ1Q2Q3Q4Q5Q6Q7Q8
11921219999
2232235998
313212627
42582599599
516516767
6913969943
792996151
811116565
922226666
1011119999998
1129229979999
12191199999999
1399993131
1499993333

 

- My research question is "Which is the last question a person stops answer?", E.g.,

 

I want my output data to look like (a frequency table)

- N is # of people who stopped at a question that falls into this category

 

 N%
Outdoor  
indoor  
Food  
drink  
  

 

The challenge I am facing are:

- How to count 'Stopped answering", e.g., in the example input data, person 1 have missing data on Q2, Q7, Q8, but Q8 is where he stopped, so each person only has 1 question

something like:

Person_IDSTOPSTOP_GROUP
1Q7indoor
2Q7indoor
3Q8indoor
4  

 

- There are 300+ columns and 20+ 'groups', and the column/variable names are not as simple as Q1 - Q300+, they each have wired name (XHV, DWE, DIO....), I made them as Q1 - Q30 just for demonstration

 

Any idea what is the best approach for creating the about 'ideal output' frequency table?

 

Any help will be greatly appreciated!

 

Thank you so much!

Yijun

5 REPLIES 5
novinosrin
Tourmaline | Level 20

Are you sure Person2 stopped answering at Q7? 

 

223223599

8

 

Your table suggests he/she has answered Q8???????

LisaYIN9309
Obsidian | Level 7

Thanks @novinosrin for your question.

 

I just thought about it and you are right, in this fake data I put together, yes, Person2 didn't stop answering at Q7.

 

Really good point, which brings more questions about how to best account for ' at which question a person stops answering'. I translate it into 'which is the last question that has a missing value', but now that I think, it should be 'which is the last question that has a missing value AND all questions after that don't have valid value'

 

Really appreciate your help!

novinosrin
Tourmaline | Level 20

You wrote -'which is the last question that has a missing value AND all questions after that don't have valid value'

 

If the above holds true, then it's a piece of cake

 

data have;
input Person_ID	Q1	Q2	Q3	Q4	Q5	Q6	Q7	Q8;
cards;
1	1	9	2	1	2	1	99	99
2	2	3	2	2	3	5	99	8
3	1	3	2	1	2	6	2	7
4	2	5	8	2	5	99	5	99
5	1	6	5	1	6	7	6	7
6	9	1	3	9	6	99	4	3
7	9	2	9	9	6	1	5	1
8	1	1	1	1	6	5	6	5
9	2	2	2	2	6	6	6	6
10	1	1	1	1	99	99	99	8
11	2	9	2	2	99	7	99	99
12	1	9	1	1	99	99	99	99
13	9	9	9	9	3	1	3	1
14	9	9	9	9	3	3	3	3
;

data want;
set have;
array t(*) Q:;
do n=dim(t) to 1 by -1;
if (t(n) in (99,9)) then do;
stop=vname(t(n));
return;
end;
end;
run;
novinosrin
Tourmaline | Level 20

And to address -

- There are 300+ columns and 20+ 'groups', and the column/variable names are not as simple as Q1 - Q300+, they each have wired name (XHV, DWE, DIO....), I made them as Q1 - Q30 just for demonstration

 

Use Variable lists  with a double -- 

 

XHV, DWE, DIO

 

can be written as array t(*) XHV-- DIO; *xhv your first variable and dio is last variable in the example list;

ballardw
Super User

I have dealt with this exact question previously though I did not have SAS at the time and had to use less flexible tools.

There are some questions that answers to would help.

1) Are all of the "question" answers numeric or all character or do you have numeric character responses intermixed?

2) Are all of the question variables actually named Qxx where xx is some numeric value AND the value is at least in order as asked in the question instrument. (I ask as I have seen questionnaires that change over time and then next thing you know Q121 is the insert after Q12).

 

If all of the question variables are of the same type, numeric or character then this should work (as long as you don't already have a variable named QUEST to conflict with the array name)

data want;
   set have;
   array quest q: ;
/* alternatively: array quest q1 -- lastquestionvariablname;*/
   do i= dim(quest) to 1 by (-1);
      if not missing(Quest[i]) then do;
         LastQuest = vname(quest[i]);
         leave;
      end;
   end;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 914 views
  • 0 likes
  • 3 in conversation