SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
jjsooner
Calcite | Level 5

Hi, first time poster here. I am just starting to use SAS after a couple long layoffs, so am more than a bit rusty. 

I'm using SAS 9.4 for Windows. 

Anyhow, I'm trying to rollup some data that into one row per ID/question. 

My data looks like this: 

 

IDQAnswer
11None
12Read Aloud
13Distract
14Overall
21Single
22Sign
22PAS
23None
24Behavior
31Two choice
32Sign
33Too high
34Overall
34Behavior
34Environment

 

I want it to look like this:

IDQAnswer 
11None
12Read Aloud
13Distract
14Overall
21Single
22Sign, PAS
23None
24Behavior
31Two choice
32Sign
33Too High
34Overall, Behavior, Environment

 

My code looks like this:

data want;
set have (rename=(answer=answer1));
by id q;
length answer $250; *there are up to 6 answers that can be combined here; 
retain answer 
if first.id and first.q then answer=' ';
answer=catx(',',answer, answer1);
if last.q then output;
run;

 

It's giving me all the combined answers in last.q - (i.e. for ID 1, Q 4, it would give me None, Read Aloud, Distract, Overall). I see why this is happening, but I can't seem to find the fix that gives me what I want even though I know it must be simple and is staring straight at me! 

 

As an extension, I may want to then expand Answer into up to 6 columns, but I can figure that part out once I get there. 

 

Thanks!

13 REPLIES 13
PaigeMiller
Diamond | Level 26

It might help if you explained what you are going to do with this "rolled up" data after you have it, because there might be easier and more efficient ways of getting to your end result. As it is, the creating of text strings like "Overall, Behavior, Environment" is usually less efficient and harder to work with than the original long data set you showed.

--
Paige Miller
jjsooner
Calcite | Level 5

Simple frequencies, nothing fancy.

We have survey data that had some multiple choice-multiple select options. As it came to me, I had a variable called text that had up to 6 number codes, each with brackets (i.e., [123456], [435345]), that represented the choices selected in one field. That had to be "unrolled," matched with the file that had the text options (this had to be matched with three by variables), and that's where I am at now.
I want my dataset to have one row per respondent/question instead of the current 1-5 rows per respondent per question. 

Reeza
Super User
No...the long format is better for survey data analysis IMO.
Then your PROC FREQ and MEANS come out into better aggregate tables.

Right now you could run all at once and it would come into a single table nicely to be displayed.
PaigeMiller
Diamond | Level 26

Simple frequencies, nothing fancy.


Then rolling up the data the way you are doing it is not helpful, it is unnecessary work, and it is inefficient. You can use PROC FREQ to get simple frequencies from your original data set, three lines of code, no re-arranging the data.

 

proc freq data=have;
    tables id*q/list;
run;
--
Paige Miller
jjsooner
Calcite | Level 5

ID is representative of an individual. I actually did do proc freq first with the data as it is, but that did not get me what I want, which is why I want to roll up to one row per individual. 

 

(proc freq data=have; 

 tables q*answer/list;

run;) 

Reeza
Super User
Show us what you want. It's possible you need this format, but I doubt it.
Reeza
Super User

Although this illustrates how to answer your question, I'd also recommend considering @PaigeMiller suggestion of rethinking this. It's usually not a great idea. 

 

*create sample data for demonstration;
data have;
    infile cards dlm='09'x;
    input OrgID Product $   States $;
    cards;
1   football    DC
1   football    VA
1   football    MD
2   football    CA
3   football    NV
3   football    CA
;
run;

*Sort - required for both options;
proc sort data=have;
    by orgID;
run;

**********************************************************************;
*Use RETAIN and BY group processing to combine the information;
**********************************************************************;
data want_option1;
    set have;
    by orgID;
    length combined $100.;
    retain combined;

    if first.orgID then
        combined=states;
    else
        combined=catx(', ', combined, states);

    if last.orgID then
        output;
run;

**********************************************************************;
*Transpose it to a wide format and then combine into a single field;
**********************************************************************;
proc transpose data=have out=wide prefix=state_;
    by orgID;
    var states;
run;

data want_option2;
    set wide;
    length combined $100.;
    combined=catx(', ', of state_:);
run;

 

https://gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a

novinosrin
Tourmaline | Level 20

Hi @jjsooner 

data have;
input ID	Q	Answer & $15.;
cards;
1	1	None
1	2	Read Aloud
1	3	Distract
1	4	Overall
2	1	Single
2	2	Sign
2	2	PAS
2	3	None
2	4	Behavior
3	1	Two choice
3	2	Sign
3	3	Too high
3	4	Overall
3	4	Behavior
3	4	Environment
;

data want;
 do until(last.q);
  set have;
  by id q;
  length _answer $100;
  _answer=catx(',',_answer,answer);
 end;
 drop answer;
 rename _answer=answer;
run;
jjsooner
Calcite | Level 5

This seemed to work, but then I realized my n-count is not what I expected. I'm not sure why and am out of steam today. I will revisit next week with fresh eyes. Thanks! 

ballardw
Super User

I often find that if I clearly define what the research question(s) are then I get ideas on how to process the data or if it is even needed.

 

That way I know what needs to be counted and / or grouped.

 

By the way, in your later analysis what would you do with such result from different respondents?

 

2 2 Sign, PAS
27 2 PAS, Sign
jjsooner
Calcite | Level 5

It's multiple choice, multiple select survey. What you described will not happen. 

ballardw
Super User

@jjsooner wrote:

It's multiple choice, multiple select survey. What you described will not happen. 


Don't bet on that without confirming it.

One of the survey data collection systems I worked with had options that when a question was multiple choice you could get the response in either of question appearance order or in response selection order. Considering that some questions in surveys will ask people to Rank items by importance/ need/ frequency of use and similar that option is likely available.

The order of multiple response could be 1) question appearance order, 2) value order (alphabetic or numeric) or 3) response selection order that I have worked with or 4) multiple binary coded variables to indicate choice selected or not. Of the 4 options only 1 would be as you may be assuming.

 

We won't even go into the possibility of data edited by the collectors but prior to you receiving the data.

RichardDeVen
Barite | Level 11

Change 

if first.id and first.q then answer = ' ';

to

if first.q then answer = ' ';

 

and

if last.q then output;
end;

 to 

if last.q;

 

Make sure you have all the needed semi-colons

retain answer ;

 

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 3343 views
  • 4 likes
  • 6 in conversation