Hi, first time poster here. I am just starting to use SAS after a couple long layoffs, so am more than a bit rusty.
I'm using SAS 9.4 for Windows.
Anyhow, I'm trying to rollup some data that into one row per ID/question.
My data looks like this:
| ID | Q | Answer | 
| 1 | 1 | None | 
| 1 | 2 | Read Aloud | 
| 1 | 3 | Distract | 
| 1 | 4 | Overall | 
| 2 | 1 | Single | 
| 2 | 2 | Sign | 
| 2 | 2 | PAS | 
| 2 | 3 | None | 
| 2 | 4 | Behavior | 
| 3 | 1 | Two choice | 
| 3 | 2 | Sign | 
| 3 | 3 | Too high | 
| 3 | 4 | Overall | 
| 3 | 4 | Behavior | 
| 3 | 4 | Environment | 
I want it to look like this:
| ID | Q | Answer | 
| 1 | 1 | None | 
| 1 | 2 | Read Aloud | 
| 1 | 3 | Distract | 
| 1 | 4 | Overall | 
| 2 | 1 | Single | 
| 2 | 2 | Sign, PAS | 
| 2 | 3 | None | 
| 2 | 4 | Behavior | 
| 3 | 1 | Two choice | 
| 3 | 2 | Sign | 
| 3 | 3 | Too High | 
| 3 | 4 | Overall, Behavior, Environment | 
My code looks like this:
data want;
set have (rename=(answer=answer1));
by id q;
length answer $250; *there are up to 6 answers that can be combined here; 
retain answer 
if first.id and first.q then answer=' ';
answer=catx(',',answer, answer1);
if last.q then output;
run;
It's giving me all the combined answers in last.q - (i.e. for ID 1, Q 4, it would give me None, Read Aloud, Distract, Overall). I see why this is happening, but I can't seem to find the fix that gives me what I want even though I know it must be simple and is staring straight at me!
As an extension, I may want to then expand Answer into up to 6 columns, but I can figure that part out once I get there.
Thanks!
It might help if you explained what you are going to do with this "rolled up" data after you have it, because there might be easier and more efficient ways of getting to your end result. As it is, the creating of text strings like "Overall, Behavior, Environment" is usually less efficient and harder to work with than the original long data set you showed.
Simple frequencies, nothing fancy.
We have survey data that had some multiple choice-multiple select options. As it came to me, I had a variable called text that had up to 6 number codes, each with brackets (i.e., [123456], [435345]), that represented the choices selected in one field. That had to be "unrolled," matched with the file that had the text options (this had to be matched with three by variables), and that's where I am at now.
I want my dataset to have one row per respondent/question instead of the current 1-5 rows per respondent per question. 
Simple frequencies, nothing fancy.
Then rolling up the data the way you are doing it is not helpful, it is unnecessary work, and it is inefficient. You can use PROC FREQ to get simple frequencies from your original data set, three lines of code, no re-arranging the data.
proc freq data=have;
    tables id*q/list;
run;ID is representative of an individual. I actually did do proc freq first with the data as it is, but that did not get me what I want, which is why I want to roll up to one row per individual.
(proc freq data=have;
tables q*answer/list;
run;)
Although this illustrates how to answer your question, I'd also recommend considering @PaigeMiller suggestion of rethinking this. It's usually not a great idea.
*create sample data for demonstration;
data have;
    infile cards dlm='09'x;
    input OrgID Product $   States $;
    cards;
1   football    DC
1   football    VA
1   football    MD
2   football    CA
3   football    NV
3   football    CA
;
run;
*Sort - required for both options;
proc sort data=have;
    by orgID;
run;
**********************************************************************;
*Use RETAIN and BY group processing to combine the information;
**********************************************************************;
data want_option1;
    set have;
    by orgID;
    length combined $100.;
    retain combined;
    if first.orgID then
        combined=states;
    else
        combined=catx(', ', combined, states);
    if last.orgID then
        output;
run;
**********************************************************************;
*Transpose it to a wide format and then combine into a single field;
**********************************************************************;
proc transpose data=have out=wide prefix=state_;
    by orgID;
    var states;
run;
data want_option2;
    set wide;
    length combined $100.;
    combined=catx(', ', of state_:);
run;
https://gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a
Hi @jjsooner
data have;
input ID	Q	Answer & $15.;
cards;
1	1	None
1	2	Read Aloud
1	3	Distract
1	4	Overall
2	1	Single
2	2	Sign
2	2	PAS
2	3	None
2	4	Behavior
3	1	Two choice
3	2	Sign
3	3	Too high
3	4	Overall
3	4	Behavior
3	4	Environment
;
data want;
 do until(last.q);
  set have;
  by id q;
  length _answer $100;
  _answer=catx(',',_answer,answer);
 end;
 drop answer;
 rename _answer=answer;
run;This seemed to work, but then I realized my n-count is not what I expected. I'm not sure why and am out of steam today. I will revisit next week with fresh eyes. Thanks!
I often find that if I clearly define what the research question(s) are then I get ideas on how to process the data or if it is even needed.
That way I know what needs to be counted and / or grouped.
By the way, in your later analysis what would you do with such result from different respondents?
| 2 | 2 | Sign, PAS | 
| 27 | 2 | PAS, Sign | 
It's multiple choice, multiple select survey. What you described will not happen.
@jjsooner wrote:
It's multiple choice, multiple select survey. What you described will not happen.
Don't bet on that without confirming it.
One of the survey data collection systems I worked with had options that when a question was multiple choice you could get the response in either of question appearance order or in response selection order. Considering that some questions in surveys will ask people to Rank items by importance/ need/ frequency of use and similar that option is likely available.
The order of multiple response could be 1) question appearance order, 2) value order (alphabetic or numeric) or 3) response selection order that I have worked with or 4) multiple binary coded variables to indicate choice selected or not. Of the 4 options only 1 would be as you may be assuming.
We won't even go into the possibility of data edited by the collectors but prior to you receiving the data.
Change
if first.id and first.q then answer = ' ';to
if first.q then answer = ' ';
and
if last.q then output;
end;to
if last.q;
Make sure you have all the needed semi-colons
retain answer ;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
