- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, first time poster here. I am just starting to use SAS after a couple long layoffs, so am more than a bit rusty.
I'm using SAS 9.4 for Windows.
Anyhow, I'm trying to rollup some data that into one row per ID/question.
My data looks like this:
ID | Q | Answer |
1 | 1 | None |
1 | 2 | Read Aloud |
1 | 3 | Distract |
1 | 4 | Overall |
2 | 1 | Single |
2 | 2 | Sign |
2 | 2 | PAS |
2 | 3 | None |
2 | 4 | Behavior |
3 | 1 | Two choice |
3 | 2 | Sign |
3 | 3 | Too high |
3 | 4 | Overall |
3 | 4 | Behavior |
3 | 4 | Environment |
I want it to look like this:
ID | Q | Answer |
1 | 1 | None |
1 | 2 | Read Aloud |
1 | 3 | Distract |
1 | 4 | Overall |
2 | 1 | Single |
2 | 2 | Sign, PAS |
2 | 3 | None |
2 | 4 | Behavior |
3 | 1 | Two choice |
3 | 2 | Sign |
3 | 3 | Too High |
3 | 4 | Overall, Behavior, Environment |
My code looks like this:
data want;
set have (rename=(answer=answer1));
by id q;
length answer $250; *there are up to 6 answers that can be combined here;
retain answer
if first.id and first.q then answer=' ';
answer=catx(',',answer, answer1);
if last.q then output;
run;
It's giving me all the combined answers in last.q - (i.e. for ID 1, Q 4, it would give me None, Read Aloud, Distract, Overall). I see why this is happening, but I can't seem to find the fix that gives me what I want even though I know it must be simple and is staring straight at me!
As an extension, I may want to then expand Answer into up to 6 columns, but I can figure that part out once I get there.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It might help if you explained what you are going to do with this "rolled up" data after you have it, because there might be easier and more efficient ways of getting to your end result. As it is, the creating of text strings like "Overall, Behavior, Environment" is usually less efficient and harder to work with than the original long data set you showed.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Simple frequencies, nothing fancy.
We have survey data that had some multiple choice-multiple select options. As it came to me, I had a variable called text that had up to 6 number codes, each with brackets (i.e., [123456], [435345]), that represented the choices selected in one field. That had to be "unrolled," matched with the file that had the text options (this had to be matched with three by variables), and that's where I am at now.
I want my dataset to have one row per respondent/question instead of the current 1-5 rows per respondent per question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Then your PROC FREQ and MEANS come out into better aggregate tables.
Right now you could run all at once and it would come into a single table nicely to be displayed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Simple frequencies, nothing fancy.
Then rolling up the data the way you are doing it is not helpful, it is unnecessary work, and it is inefficient. You can use PROC FREQ to get simple frequencies from your original data set, three lines of code, no re-arranging the data.
proc freq data=have;
tables id*q/list;
run;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
ID is representative of an individual. I actually did do proc freq first with the data as it is, but that did not get me what I want, which is why I want to roll up to one row per individual.
(proc freq data=have;
tables q*answer/list;
run;)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Although this illustrates how to answer your question, I'd also recommend considering @PaigeMiller suggestion of rethinking this. It's usually not a great idea.
*create sample data for demonstration;
data have;
infile cards dlm='09'x;
input OrgID Product $ States $;
cards;
1 football DC
1 football VA
1 football MD
2 football CA
3 football NV
3 football CA
;
run;
*Sort - required for both options;
proc sort data=have;
by orgID;
run;
**********************************************************************;
*Use RETAIN and BY group processing to combine the information;
**********************************************************************;
data want_option1;
set have;
by orgID;
length combined $100.;
retain combined;
if first.orgID then
combined=states;
else
combined=catx(', ', combined, states);
if last.orgID then
output;
run;
**********************************************************************;
*Transpose it to a wide format and then combine into a single field;
**********************************************************************;
proc transpose data=have out=wide prefix=state_;
by orgID;
var states;
run;
data want_option2;
set wide;
length combined $100.;
combined=catx(', ', of state_:);
run;
https://gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @jjsooner
data have;
input ID Q Answer & $15.;
cards;
1 1 None
1 2 Read Aloud
1 3 Distract
1 4 Overall
2 1 Single
2 2 Sign
2 2 PAS
2 3 None
2 4 Behavior
3 1 Two choice
3 2 Sign
3 3 Too high
3 4 Overall
3 4 Behavior
3 4 Environment
;
data want;
do until(last.q);
set have;
by id q;
length _answer $100;
_answer=catx(',',_answer,answer);
end;
drop answer;
rename _answer=answer;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This seemed to work, but then I realized my n-count is not what I expected. I'm not sure why and am out of steam today. I will revisit next week with fresh eyes. Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I often find that if I clearly define what the research question(s) are then I get ideas on how to process the data or if it is even needed.
That way I know what needs to be counted and / or grouped.
By the way, in your later analysis what would you do with such result from different respondents?
2 | 2 | Sign, PAS |
27 | 2 | PAS, Sign |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It's multiple choice, multiple select survey. What you described will not happen.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@jjsooner wrote:
It's multiple choice, multiple select survey. What you described will not happen.
Don't bet on that without confirming it.
One of the survey data collection systems I worked with had options that when a question was multiple choice you could get the response in either of question appearance order or in response selection order. Considering that some questions in surveys will ask people to Rank items by importance/ need/ frequency of use and similar that option is likely available.
The order of multiple response could be 1) question appearance order, 2) value order (alphabetic or numeric) or 3) response selection order that I have worked with or 4) multiple binary coded variables to indicate choice selected or not. Of the 4 options only 1 would be as you may be assuming.
We won't even go into the possibility of data edited by the collectors but prior to you receiving the data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Change
if first.id and first.q then answer = ' ';
to
if first.q then answer = ' ';
and
if last.q then output;
end;
to
if last.q;
Make sure you have all the needed semi-colons
retain answer ;