BookmarkSubscribeRSS Feed
Maldra
Fluorite | Level 6

I am trying to put all questions and answers for a given ID on a single row, but am having difficulties doing so.  There are some other fields but the main ones are concerned with re-arranging the data are:CaseID QuestionText AnswerText.  The other columns are TestPeriod, TestName, RecordPrimaryID CaseWorkDT AssignedOwner ManagerName

 

Data I have is like this (with the additional columns but that's just extra noise at this point):


CaseID           QuestionText          AnswerText
1111                Claim                     Pass
1111                Memos                  Pass
1111                Procedure             Fail
1112                Documented         Pass
1112                Sent                      Pass
1113                Contents               Pass
1114                Completed            Pass
1114                Investigated          Pass

 

Data Want:

CaseID          Q1                     A1             Q2                 A2             Q3               A3
1111               Claim                Pass         Memos          Pass         Procedure   Fail
1112               Documented    Pass          Sent              Pass
1113               Contents          Pass
1114               Completed       Pass          Investigated  Pass

I first tried doing proc transpose but that didn't seem to get me what I needed.  I then tried an array, and it looked like it was working, but we found some weird data stuff going on.

Here is what I was trying below (Note that the number of questions and answers can vary on which tests were doing that given month.  I tried using variables for the array and was given errors, but 50 seemed to be enough for any given test):


data work.TM_Assessment_4c;
set work.TM_Assessment_3c;
by CaseID;
array questionarray {50} $400 Q1-Q50;
array answerarray {50} $400 A1-A50;

if first.CaseID then
i = 1;
questionarray(i) = QuestionText;
answerarray(i) = AnswerText;

if last.CaseID then
output;
i+1;
retain Q1-Q50;
retain A1-A50;
run;

proc sql;
create table work.TM_Assessment_5c AS
select TestPeriod
, TestName
, CaseID
, RecordPrimaryID
, CaseWorkDT
, AssignedOwner
, ManagerName
, Q1
, A1
, Q2
, A2
, Q3
, A3
, Q4
, A4
, Q5
, A5
, Q6
, A6
, Q7
, A7
, Q8
, A8
, Q9
, A9
, Q10
, A10
, Q11
, A11
, Q12
, A12
, Q13
, A13
, Q14
, A14
, Q15
, A15
, Q16
, A16
, Q17
, A17
, Q18
, A18
, Q19
, A19
, Q20
, A20
, Q21
, A21
, Q22
, A22
, Q23
, A23
, Q24
, A24
, Q25
, A25
, Q26
, A26
, Q27
, A27
, Q28
, A28
, Q29
, A29
, Q30
, A30
, Q31
, A31
, Q32
, A32
, Q33
, A33
, Q34
, A34
, Q35
, A35
, Q36
, A36
, Q37
, A37
, Q38
, A38
, Q39
, A39
, Q40
, A40
, Q41
, A41
, Q42
, A42
, Q43
, A43
, Q44
, A44
, Q45
, A45
, Q46
, A46
, Q47
, A47
, Q48
, A48
, Q49
, A49
, Q50
, A50
from work.TM_Assessment_4c
order by AssignedOwner, TestPeriod, TestName, RecordPrimaryID
;
quit;

The output data looked like it was what we wanted at first, but the actual data was not correct.  We were seeing instances where a case would have only one question and one answer, and while it was showing that in the data, it was also showing additional questions and answers on the same row that had nothing to do with it.

Can anyone help me out?

Thanks all.

12 REPLIES 12
Reeza
Super User
Quick question - why? I usually spend my time getting my survey data into the format shown to make reporting and analysis easier.
Maldra
Fluorite | Level 6
That is what the stakeholder requested. I'm not sure on the reasoning.
ballardw
Super User

Personally I find your "want" structure to be subject to many problems. I can see a strong possibility the the Q value of "Claim" could end up in many different variables is just processing in order.

I suggest something like this may be more appropriate where each "question text" becomes the variable name.

 

data example;
 input id $ Activity :$15. Status :$6.;
datalines;
1111  Claim   Pass
1111  Memos  Pass
1111  Procedure   Fail
1112  Documented Pass
1112  Sent    Pass
1113  Contents   Pass
1114  Completed  Pass
1114  Investigated  Pass
;


proc transpose data=example out=trans (drop=_name_);
   by id;
   var status;
   id Activity;
run;

Note providing some data in the form of a data step that you can run to duplicate the results and posted into a text box.

Maldra
Fluorite | Level 6
Unfortunately that won't work really since there are thousands of different questions since there are thousands of test with different questions. It was just dummy data that I provided; the actual questions are long sentences. I may just have to go to them with the "have" data format if I can't get the "want" format.
ballardw
Super User

@Maldra wrote:
Unfortunately that won't work really since there are thousands of different questions since there are thousands of test with different questions. It was just dummy data that I provided; the actual questions are long sentences. I may just have to go to them with the "have" data format if I can't get the "want" format.

Possibly even more reason not to make a data set in that format.

"Long sentences" might typically be a Label for a specific variable.

I would take a document, you do have one don't you, with all the "long sentence" values and make a data set or informat that links each specific sentence to a variable name to a specific order number. Then you have something that could add "Q1" or "A1" consistently.

Example:

proc format;
invalue text2order (upcase)

'CLAIM'        =  1 
'MEMOS'        =  2
'PROCEDURE'    =  3
'DOCUMENTED'   =  4
'SENT'         =  5
'CONTENTS'     =  6
'COMPLETED'    =  7
'INVESTIGATED' =  8
;
run;

data example;
  input CaseID :$10. QuestionText :$40. AnswerText :$10.;
datalines;
1111  Claim   Pass
1111  Memos  Pass
1111  Procedure   Fail
1112  Documented Pass
1112  Sent    Pass
1113  Contents   Pass
1114  Completed  Pass
1114  Investigated  Pass
;

data want;
   set example;
   array q(8) $ 40;
   array a(8) $ 10;
   retain q: a:;
   by caseid;
   if first.caseid then call missing(of q(*), of a(*));
   q[input(questiontext,text2order.)]= questiontext;
   a[input(questiontext,text2order.)]= AnswerText;
   if last.caseid;
   keep caseid q1-q8 a1-a8;
run;


If you have any form of electronic document with the "questions" you should be able to create the invalue using Proc Format and a CNTLIN data set.

ghosh
Barite | Level 11

Could you explain the logic of how the data is stored?

What I have observed is Q1 is always the first.caseid which you will save as QuestionText and A1 as the Answertext

Then you remove the first.caseid record, in the next iteration Q2 is the first.caseid.

and so on for Q3


Will this iterative process work with your survey data structre?

Tom
Super User Tom
Super User

Before you can make the data you need to answer this question:

 

Do you want the same question/answer pair to always end up in the same pair of variables across the cases?

 

For example do you want CLAIM to always go into Q1?  And if a case does not have the CLAIM quesiton that it has missing values of Q1?  Or perhaps missing values for A1 but Q1 still says CLAIM?

 

In that case you need to first map the Question text to the variable number.  Do you have a full set of possible values?  Or do you want to just use the set of questions in the current file?  If the later will you ever get any other files of this type?  Do they need to be mapped into the same variables?

 

Or is it ok that for the first case Q1 is CLAIM and for the second case Q1 is DOCUMENTED?

To just numbers them sequentially within each CASE is the easiest.

I like to use PROC SUMMARY.

First you need to know the maximum number of questions any single CASE could have.

Let's call that number N and stick it into a macro variable named N.  For your dummy data N is 3.

data have ;
  input CaseID :$10. QuestionText :$40. AnswerText :$10.;
cards;
1111  Claim             Pass
1111  Memos             Pass
1111  Procedure         Fail
1112  Documented        Pass
1112  Sent              Pass
1113  Contents          Pass
1114  Completed         Pass
1114  Investigated      Pass
;

%let N=3;
proc summary data=have nway ;
  by caseid;
  output out=want(drop=_type_) idgroup (out[&n] (questiontext answertext)=);
run;

Results:

       Case              Question      Question        Question     Answer    Answer    Answer
Obs     ID     _FREQ_    Text_1        Text_2           Text_3      Text_1    Text_2    Text_3

 1     1111       3      Claim         Memos           Procedure     Pass      Pass      Fail
 2     1112       2      Documented    Sent                          Pass      Pass
 3     1113       1      Contents                                    Pass
 4     1114       2      Completed     Investigated                  Pass      Pass
Maldra
Fluorite | Level 6
Each test has a maximum of about 45 questions, but there are a lot of different tests with a lot of different questions. If it were that way, there would be thousands of columns, which is not what they were looking for. They just wanted all of the questions and answers one after the other in a single row.
Reeza
Super User
Looks like Tom's answer will work for you then 🙂
ballardw
Super User

Nothing to identify what the source of the questions is???

If there were such a field then this is partially defensible but I bet you are going to deal with issues of dates and numeric values converting to text somewhere along the line.

Maldra
Fluorite | Level 6

After talking to the stakeholders some more, the reason they want it in this format is so they can enter it into a system, and that is how it takes the data.  The solution of having gaps also wouldn't work.  It needs to look like my original post.  Is this something SAS is able to do?

Tom
Super User Tom
Super User

You have a couple of answer to produce what you asked for.

Please explain it what way they are not what you need?

 

If they are entering into a "system" what type of file do they need?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 833 views
  • 5 likes
  • 5 in conversation