Transpose or Array of questions and answers by ID

Maldra · Posted 03-01-2022 11:09 AM

I am trying to put all questions and answers for a given ID on a single row, but am having difficulties doing so. There are some other fields but the main ones are concerned with re-arranging the data are:CaseID QuestionText AnswerText. The other columns are TestPeriod, TestName, RecordPrimaryID CaseWorkDT AssignedOwner ManagerName

Data I have is like this (with the additional columns but that's just extra noise at this point):

CaseID QuestionText AnswerText
1111 Claim Pass
1111 Memos Pass
1111 Procedure Fail
1112 Documented Pass
1112 Sent Pass
1113 Contents Pass
1114 Completed Pass
1114 Investigated Pass

Data Want:

CaseID Q1 A1 Q2 A2 Q3 A3
1111 Claim Pass Memos Pass Procedure Fail
1112 Documented Pass Sent Pass
1113 Contents Pass
1114 Completed Pass Investigated Pass

I first tried doing proc transpose but that didn't seem to get me what I needed. I then tried an array, and it looked like it was working, but we found some weird data stuff going on.

Here is what I was trying below (Note that the number of questions and answers can vary on which tests were doing that given month. I tried using variables for the array and was given errors, but 50 seemed to be enough for any given test):

data work.TM_Assessment_4c;
set work.TM_Assessment_3c;
by CaseID;
array questionarray {50} $400 Q1-Q50;
array answerarray {50} $400 A1-A50;

if first.CaseID then
i = 1;
questionarray(i) = QuestionText;
answerarray(i) = AnswerText;

if last.CaseID then
output;
i+1;
retain Q1-Q50;
retain A1-A50;
run;

proc sql;
create table work.TM_Assessment_5c AS
select TestPeriod
, TestName
, CaseID
, RecordPrimaryID
, CaseWorkDT
, AssignedOwner
, ManagerName
, Q1
, A1
, Q2
, A2
, Q3
, A3
, Q4
, A4
, Q5
, A5
, Q6
, A6
, Q7
, A7
, Q8
, A8
, Q9
, A9
, Q10
, A10
, Q11
, A11
, Q12
, A12
, Q13
, A13
, Q14
, A14
, Q15
, A15
, Q16
, A16
, Q17
, A17
, Q18
, A18
, Q19
, A19
, Q20
, A20
, Q21
, A21
, Q22
, A22
, Q23
, A23
, Q24
, A24
, Q25
, A25
, Q26
, A26
, Q27
, A27
, Q28
, A28
, Q29
, A29
, Q30
, A30
, Q31
, A31
, Q32
, A32
, Q33
, A33
, Q34
, A34
, Q35
, A35
, Q36
, A36
, Q37
, A37
, Q38
, A38
, Q39
, A39
, Q40
, A40
, Q41
, A41
, Q42
, A42
, Q43
, A43
, Q44
, A44
, Q45
, A45
, Q46
, A46
, Q47
, A47
, Q48
, A48
, Q49
, A49
, Q50
, A50
from work.TM_Assessment_4c
order by AssignedOwner, TestPeriod, TestName, RecordPrimaryID
;
quit;

The output data looked like it was what we wanted at first, but the actual data was not correct. We were seeing instances where a case would have only one question and one answer, and while it was showing that in the data, it was also showing additional questions and answers on the same row that had nothing to do with it.

Can anyone help me out?

Thanks all.

Reeza · Posted 03-01-2022 11:24 AM

Quick question - why? I usually spend my time getting my survey data into the format shown to make reporting and analysis easier.

Maldra · Posted 03-01-2022 11:33 AM

That is what the stakeholder requested. I'm not sure on the reasoning.

ballardw · Posted 03-01-2022 11:29 AM

Personally I find your "want" structure to be subject to many problems. I can see a strong possibility the the Q value of "Claim" could end up in many different variables is just processing in order.

I suggest something like this may be more appropriate where each "question text" becomes the variable name.

data example;
 input id $ Activity :$15. Status :$6.;
datalines;
1111  Claim   Pass
1111  Memos  Pass
1111  Procedure   Fail
1112  Documented Pass
1112  Sent    Pass
1113  Contents   Pass
1114  Completed  Pass
1114  Investigated  Pass
;


proc transpose data=example out=trans (drop=_name_);
   by id;
   var status;
   id Activity;
run;

Note providing some data in the form of a data step that you can run to duplicate the results and posted into a text box.

Maldra · Posted 03-01-2022 11:41 AM

Unfortunately that won't work really since there are thousands of different questions since there are thousands of test with different questions. It was just dummy data that I provided; the actual questions are long sentences. I may just have to go to them with the "have" data format if I can't get the "want" format.

ballardw · Posted 03-01-2022 12:24 PM

@Maldra wrote:
Unfortunately that won't work really since there are thousands of different questions since there are thousands of test with different questions. It was just dummy data that I provided; the actual questions are long sentences. I may just have to go to them with the "have" data format if I can't get the "want" format.

Possibly even more reason not to make a data set in that format.

"Long sentences" might typically be a Label for a specific variable.

I would take a document, you do have one don't you, with all the "long sentence" values and make a data set or informat that links each specific sentence to a variable name to a specific order number. Then you have something that could add "Q1" or "A1" consistently.

Example:

proc format;
invalue text2order (upcase)

'CLAIM'        =  1 
'MEMOS'        =  2
'PROCEDURE'    =  3
'DOCUMENTED'   =  4
'SENT'         =  5
'CONTENTS'     =  6
'COMPLETED'    =  7
'INVESTIGATED' =  8
;
run;

data example;
  input CaseID :$10. QuestionText :$40. AnswerText :$10.;
datalines;
1111  Claim   Pass
1111  Memos  Pass
1111  Procedure   Fail
1112  Documented Pass
1112  Sent    Pass
1113  Contents   Pass
1114  Completed  Pass
1114  Investigated  Pass
;

data want;
   set example;
   array q(8) $ 40;
   array a(8) $ 10;
   retain q: a:;
   by caseid;
   if first.caseid then call missing(of q(*), of a(*));
   q[input(questiontext,text2order.)]= questiontext;
   a[input(questiontext,text2order.)]= AnswerText;
   if last.caseid;
   keep caseid q1-q8 a1-a8;
run;

If you have any form of electronic document with the "questions" you should be able to create the invalue using Proc Format and a CNTLIN data set.

ghosh · Posted 03-01-2022 11:58 AM

Could you explain the logic of how the data is stored?

What I have observed is Q1 is always the first.caseid which you will save as QuestionText and A1 as the Answertext

Then you remove the first.caseid record, in the next iteration Q2 is the first.caseid.

and so on for Q3

Will this iterative process work with your survey data structre?

Tom · Posted 03-01-2022 11:59 AM

Before you can make the data you need to answer this question:

Do you want the same question/answer pair to always end up in the same pair of variables across the cases?

For example do you want CLAIM to always go into Q1? And if a case does not have the CLAIM quesiton that it has missing values of Q1? Or perhaps missing values for A1 but Q1 still says CLAIM?

In that case you need to first map the Question text to the variable number. Do you have a full set of possible values? Or do you want to just use the set of questions in the current file? If the later will you ever get any other files of this type? Do they need to be mapped into the same variables?

Or is it ok that for the first case Q1 is CLAIM and for the second case Q1 is DOCUMENTED?

To just numbers them sequentially within each CASE is the easiest.

I like to use PROC SUMMARY.

First you need to know the maximum number of questions any single CASE could have.

Let's call that number N and stick it into a macro variable named N. For your dummy data N is 3.

data have ;
  input CaseID :$10. QuestionText :$40. AnswerText :$10.;
cards;
1111  Claim             Pass
1111  Memos             Pass
1111  Procedure         Fail
1112  Documented        Pass
1112  Sent              Pass
1113  Contents          Pass
1114  Completed         Pass
1114  Investigated      Pass
;

%let N=3;
proc summary data=have nway ;
  by caseid;
  output out=want(drop=_type_) idgroup (out[&n] (questiontext answertext)=);
run;

Results:

       Case              Question      Question        Question     Answer    Answer    Answer
Obs     ID     _FREQ_    Text_1        Text_2           Text_3      Text_1    Text_2    Text_3

 1     1111       3      Claim         Memos           Procedure     Pass      Pass      Fail
 2     1112       2      Documented    Sent                          Pass      Pass
 3     1113       1      Contents                                    Pass
 4     1114       2      Completed     Investigated                  Pass      Pass

Maldra · Posted 03-01-2022 02:01 PM

Each test has a maximum of about 45 questions, but there are a lot of different tests with a lot of different questions. If it were that way, there would be thousands of columns, which is not what they were looking for. They just wanted all of the questions and answers one after the other in a single row.

Reeza · Posted 03-01-2022 02:10 PM

Looks like Tom's answer will work for you then 🙂

ballardw · Posted 03-01-2022 06:00 PM

Nothing to identify what the source of the questions is???

If there were such a field then this is partially defensible but I bet you are going to deal with issues of dates and numeric values converting to text somewhere along the line.

Maldra · Posted 03-08-2022 05:17 PM

After talking to the stakeholders some more, the reason they want it in this format is so they can enter it into a system, and that is how it takes the data. The solution of having gaps also wouldn't work. It needs to look like my original post. Is this something SAS is able to do?

Tom · Posted 03-08-2022 06:13 PM

You have a couple of answer to produce what you asked for.

Please explain it what way they are not what you need?

If they are entering into a "system" what type of file do they need?

Transpose or Array of questions and answers by ID

Re: Transpose or Array of questions and answers by ID

Re: Transpose or Array of questions and answers by ID

Re: Transpose or Array of questions and answers by ID

Re: Transpose or Array of questions and answers by ID

Re: Transpose or Array of questions and answers by ID

Re: Transpose or Array of questions and answers by ID

Re: Transpose or Array of questions and answers by ID

Re: Transpose or Array of questions and answers by ID

Re: Transpose or Array of questions and answers by ID

Re: Transpose or Array of questions and answers by ID

Re: Transpose or Array of questions and answers by ID

Re: Transpose or Array of questions and answers by ID

SAS Innovate 2025: Call for Content

Classroom Training Available!