SAS Programming

jjsooner · Posted 03-06-2020 03:25 PM

Hi, first time poster here. I am just starting to use SAS after a couple long layoffs, so am more than a bit rusty.

I'm using SAS 9.4 for Windows.

Anyhow, I'm trying to rollup some data that into one row per ID/question.

My data looks like this:

ID	Q	Answer
1	1	None
1	2	Read Aloud
1	3	Distract
1	4	Overall
2	1	Single
2	2	Sign
2	2	PAS
2	3	None
2	4	Behavior
3	1	Two choice
3	2	Sign
3	3	Too high
3	4	Overall
3	4	Behavior
3	4	Environment

I want it to look like this:

ID	Q	Answer
1	1	None
1	2	Read Aloud
1	3	Distract
1	4	Overall
2	1	Single
2	2	Sign, PAS
2	3	None
2	4	Behavior
3	1	Two choice
3	2	Sign
3	3	Too High
3	4	Overall, Behavior, Environment

My code looks like this:

data want;
set have (rename=(answer=answer1));
by id q;
length answer $250; *there are up to 6 answers that can be combined here; 
retain answer 
if first.id and first.q then answer=' ';
answer=catx(',',answer, answer1);
if last.q then output;
run;

It's giving me all the combined answers in last.q - (i.e. for ID 1, Q 4, it would give me None, Read Aloud, Distract, Overall). I see why this is happening, but I can't seem to find the fix that gives me what I want even though I know it must be simple and is staring straight at me!

As an extension, I may want to then expand Answer into up to 6 columns, but I can figure that part out once I get there.

Thanks!

PaigeMiller · Posted 03-06-2020 03:36 PM

It might help if you explained what you are going to do with this "rolled up" data after you have it, because there might be easier and more efficient ways of getting to your end result. As it is, the creating of text strings like "Overall, Behavior, Environment" is usually less efficient and harder to work with than the original long data set you showed.

--
Paige Miller

jjsooner · Posted 03-06-2020 03:47 PM

Simple frequencies, nothing fancy.

We have survey data that had some multiple choice-multiple select options. As it came to me, I had a variable called text that had up to 6 number codes, each with brackets (i.e., [123456], [435345]), that represented the choices selected in one field. That had to be "unrolled," matched with the file that had the text options (this had to be matched with three by variables), and that's where I am at now.
I want my dataset to have one row per respondent/question instead of the current 1-5 rows per respondent per question.

Reeza · Posted 03-06-2020 03:51 PM

No...the long format is better for survey data analysis IMO.
Then your PROC FREQ and MEANS come out into better aggregate tables.

Right now you could run all at once and it would come into a single table nicely to be displayed.

PaigeMiller · Posted 03-06-2020 03:57 PM

Simple frequencies, nothing fancy.

Then rolling up the data the way you are doing it is not helpful, it is unnecessary work, and it is inefficient. You can use PROC FREQ to get simple frequencies from your original data set, three lines of code, no re-arranging the data.

proc freq data=have;
    tables id*q/list;
run;

--
Paige Miller

jjsooner · Posted 03-06-2020 04:05 PM

ID is representative of an individual. I actually did do proc freq first with the data as it is, but that did not get me what I want, which is why I want to roll up to one row per individual.

(proc freq data=have;

tables q*answer/list;

run;)

Reeza · Posted 03-06-2020 04:14 PM

Show us what you want. It's possible you need this format, but I doubt it.

Reeza · Posted 03-06-2020 03:46 PM

Although this illustrates how to answer your question, I'd also recommend considering @PaigeMiller suggestion of rethinking this. It's usually not a great idea.

*create sample data for demonstration;
data have;
    infile cards dlm='09'x;
    input OrgID Product $   States $;
    cards;
1   football    DC
1   football    VA
1   football    MD
2   football    CA
3   football    NV
3   football    CA
;
run;

*Sort - required for both options;
proc sort data=have;
    by orgID;
run;

**********************************************************************;
*Use RETAIN and BY group processing to combine the information;
**********************************************************************;
data want_option1;
    set have;
    by orgID;
    length combined $100.;
    retain combined;

    if first.orgID then
        combined=states;
    else
        combined=catx(', ', combined, states);

    if last.orgID then
        output;
run;

**********************************************************************;
*Transpose it to a wide format and then combine into a single field;
**********************************************************************;
proc transpose data=have out=wide prefix=state_;
    by orgID;
    var states;
run;

data want_option2;
    set wide;
    length combined $100.;
    combined=catx(', ', of state_:);
run;

https://gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a

novinosrin · Posted 03-06-2020 03:52 PM

Hi @jjsooner

data have;
input ID	Q	Answer & $15.;
cards;
1	1	None
1	2	Read Aloud
1	3	Distract
1	4	Overall
2	1	Single
2	2	Sign
2	2	PAS
2	3	None
2	4	Behavior
3	1	Two choice
3	2	Sign
3	3	Too high
3	4	Overall
3	4	Behavior
3	4	Environment
;

data want;
 do until(last.q);
  set have;
  by id q;
  length _answer $100;
  _answer=catx(',',_answer,answer);
 end;
 drop answer;
 rename _answer=answer;
run;

jjsooner · Posted 03-06-2020 05:30 PM

This seemed to work, but then I realized my n-count is not what I expected. I'm not sure why and am out of steam today. I will revisit next week with fresh eyes. Thanks!

ballardw · Posted 03-06-2020 04:54 PM

I often find that if I clearly define what the research question(s) are then I get ideas on how to process the data or if it is even needed.

That way I know what needs to be counted and / or grouped.

By the way, in your later analysis what would you do with such result from different respondents?

2	2	Sign, PAS
27	2	PAS, Sign

jjsooner · Posted 03-06-2020 05:41 PM

It's multiple choice, multiple select survey. What you described will not happen.

ballardw · Posted 03-06-2020 06:32 PM

@jjsooner wrote:

It's multiple choice, multiple select survey. What you described will not happen.

Don't bet on that without confirming it.

One of the survey data collection systems I worked with had options that when a question was multiple choice you could get the response in either of question appearance order or in response selection order. Considering that some questions in surveys will ask people to Rank items by importance/ need/ frequency of use and similar that option is likely available.

The order of multiple response could be 1) question appearance order, 2) value order (alphabetic or numeric) or 3) response selection order that I have worked with or 4) multiple binary coded variables to indicate choice selected or not. Of the 4 options only 1 would be as you may be assuming.

We won't even go into the possibility of data edited by the collectors but prior to you receiving the data.

RichardDeVen · Posted 03-06-2020 05:39 PM

Change

if first.id and first.q then answer = ' ';

to

if first.q then answer = ' ';

and

if last.q then output;
end;

to

if last.q;

Make sure you have all the needed semi-colons

retain answer ;

SAS Programming

How do I roll up data with two by variables?

Re: How do I roll up data with two by variables?

Re: How do I roll up data with two by variables?

Re: How do I roll up data with two by variables?

Re: How do I roll up data with two by variables?

Re: How do I roll up data with two by variables?

Re: How do I roll up data with two by variables?

Re: How do I roll up data with two by variables?

Re: How do I roll up data with two by variables?

Re: How do I roll up data with two by variables?

Re: How do I roll up data with two by variables?

Re: How do I roll up data with two by variables?

Re: How do I roll up data with two by variables?

Re: How do I roll up data with two by variables?

Follow Us

What is...

SAS Programming

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...