I have variables for 58 questions (q1,q2,...,q58).
Each variable has values 1,2,3.
I would like to create a dataset with 4 variables (and 58 rows).
The first variable, q, should contain a list of q1,...,q58.
The second variable (x1) is the percentage of 1s for each question.
The third variable (x2) is the percentage of 2s for each question.
The fourth variable (x3) is the percentage of 3s fore each question.
The following code (after transposing the results) gets the answer for q1, but how would I do this for q1-q58 and then transpose the results so that I have 4 columns?
ods output onewayfreqs=percents(keep=q1 percent);
proc freq data=test;
tables q1;
run;
ods output close;
Thank you.
I wouldn't bother with trying to transpose or use any proc to do this. If I correctly understand what you want, a datastep could do everything and faster and easier than most methods. E.g.:
%let qmax=5;
data have;
input q1-q&qmax.;
cards;
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
1 1 1 1 1
2 2 2 2 2
2 2 2 2 2
;
data want (keep=q x:);
array _q(*) q1-q&qmax.;
array _x1(&qmax.);
array _x2(&qmax.);
array _x3(&qmax.);
set have end=last;
retain _x1: _x2: _x3:;
do i=1 to &qmax.;
_x1(i)+(_q(i) eq 1);
_x2(i)+(_q(i) eq 2);
_x3(i)+(_q(i) eq 3);
if i eq 1 then _xtotal+1;
if last then do;
q=vname(_q(i));
x1=_x1(i)/_xtotal;
x2=_x2(i)/_xtotal;
x3=_x3(i)/_xtotal;
output;
end;
end;
run;
This code:
data test;
input q $ ans;
cards;
q1 1
q1 2
q1 3
q1 4
q1 1
q2 1
q2 2
q3 1
;
run;
proc sql;
create table summary as
select
q,
sum(case when ans=1 then responses else 0 end)/count(*) as Pct_1,
sum(case when ans=2 then responses else 0 end)/count(*) as Pct_2,
sum(case when ans=3 then responses else 0 end)/count(*) as Pct_3,
sum(case when ans=4 then responses else 0 end)/count(*) as Pct_4
from
(select q, ans, count(*) as Responses from work.test group by q, ans)
group by q;
select * from summary;
quit;
produces this output:
q | Pct_1 | Pct_2 | Pct_3 | Pct_4 |
---|---|---|---|---|
q1 | 0.5 | 0.25 | 0.25 | 0.25 |
q2 | 0.5 | 0.5 | 0 | 0 |
q3 | 1 | 0 | 0 | 0 |
The dataset I'm using has one variable for each of the 58 questions (plus some other variables). For various reasons I'm not able to transpose it so that it has two variables like the dataset you created for your example.
Do you know how to find the percentages (x1, x2, x3) for each of the 58 questions and then create four columns (q, x1, x2, x3)?
I don't understand what your data look like. Can you give a quick example?
I have some other variables that I plan to subset the data set with (departments and divisions), but these are the variables I wish to create percentages for (x1, x2, x3).
q1 q2 q3 .... q58
1 1 2 ..... 3
2 3 1 ...... 2
1 1 3 ...... 2
The output you created is exactly what I would like my output to look like.
I wouldn't bother with trying to transpose or use any proc to do this. If I correctly understand what you want, a datastep could do everything and faster and easier than most methods. E.g.:
%let qmax=5;
data have;
input q1-q&qmax.;
cards;
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
1 1 1 1 1
2 2 2 2 2
2 2 2 2 2
;
data want (keep=q x:);
array _q(*) q1-q&qmax.;
array _x1(&qmax.);
array _x2(&qmax.);
array _x3(&qmax.);
set have end=last;
retain _x1: _x2: _x3:;
do i=1 to &qmax.;
_x1(i)+(_q(i) eq 1);
_x2(i)+(_q(i) eq 2);
_x3(i)+(_q(i) eq 3);
if i eq 1 then _xtotal+1;
if last then do;
q=vname(_q(i));
x1=_x1(i)/_xtotal;
x2=_x2(i)/_xtotal;
x3=_x3(i)/_xtotal;
output;
end;
end;
run;
Thanks pretty neat..expanding on that for this specifics:
This code:
data test;
input
q1 q2 q3 q4 q5 q6 q7 q8 q9 q10
q11 q12 q13 q14 q15 q16 q17 q18 q19 q20
q21 q22 q23 q24 q25 q26 q27 q28 q29 q30
q31 q32 q33 q34 q35 q36 q37 q38 q39 q40
q41 q42 q43 q44 q45 q46 q47 q48;
cards;
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
;
run;
data want (keep=q x:);
array _q(*) q1-q48;
array _x1(48);
array _x2(48);
array _x3(48);
array _x4(48);
set test end=last;
retain _x1: _x2: _x3: _x4: ;
do i=1 to 48;
_x1(i)+(_q(i) eq 1);
_x2(i)+(_q(i) eq 2);
_x3(i)+(_q(i) eq 3);
_x4(i)+(_q(i) eq 4);
if i eq 1 then _xtotal+1;
if last then do;
q=vname(_q(i));
x1=_x1(i)/_xtotal;
x2=_x2(i)/_xtotal;
x3=_x3(i)/_xtotal;
x4=_x4(i)/_xtotal;
output;
end;
end;
run;
produces this output:
q1 0.4 0.2 0.2 0.2
q2 0.2 0.4 0.2 0.2
q3 0.2 0.2 0.4 0.2
q4 0.2 0.2 0.2 0.4
q5 0.4 0.2 0.2 0.2
q6 0.2 0.4 0.2 0.2
q7 0.2 0.2 0.4 0.2
q8 0.2 0.2 0.2 0.4
q9 0.4 0.2 0.2 0.2
q10 0.2 0.4 0.2 0.2
q11 0.2 0.2 0.4 0.2
q12 0.2 0.2 0.2 0.4
q13 0.4 0.2 0.2 0.2
q14 0.2 0.4 0.2 0.2
q15 0.2 0.2 0.4 0.2
q16 0.2 0.2 0.2 0.4
q17 0.4 0.2 0.2 0.2
q18 0.2 0.4 0.2 0.2
q19 0.2 0.2 0.4 0.2
q20 0.2 0.2 0.2 0.4
q21 0.4 0.2 0.2 0.2
q22 0.2 0.4 0.2 0.2
q23 0.2 0.2 0.4 0.2
q24 0.2 0.2 0.2 0.4
q25 0.4 0.2 0.2 0.2
q26 0.2 0.4 0.2 0.2
q27 0.2 0.2 0.4 0.2
q28 0.2 0.2 0.2 0.4
q29 0.4 0.2 0.2 0.2
q30 0.2 0.4 0.2 0.2
q31 0.2 0.2 0.4 0.2
q32 0.2 0.2 0.2 0.4
q33 0.4 0.2 0.2 0.2
q34 0.2 0.4 0.2 0.2
q35 0.2 0.2 0.4 0.2
q36 0.2 0.2 0.2 0.4
q37 0.4 0.2 0.2 0.2
q38 0.2 0.4 0.2 0.2
q39 0.2 0.2 0.4 0.2
q40 0.2 0.2 0.2 0.4
q41 0.4 0.2 0.2 0.2
q42 0.2 0.4 0.2 0.2
q43 0.2 0.2 0.4 0.2
q44 0.2 0.2 0.2 0.4
q45 0.4 0.2 0.2 0.2
q46 0.2 0.4 0.2 0.2
q47 0.2 0.2 0.4 0.2
q48 0.2 0.2 0.2 0.4
I like the simplicity and flexibility of PROC/DATA/ PROC.
data test;
input (q1-q48)(f1.);
format q: f1.;
cards;
123412341234123412341234123412341234123412341234
222222222222222222222222222222222222222222222222
333333333333333333333333333333333333333333333333
444444444444444444444444444444444444444444444444
111111111111111111111111111111111111111111111111
;;;;
run;
ods output onewayfreqs=freq;
ods listing off;
proc freq;
tables q:;
run;
ods listing;
data freq;
set freq;
length vname $32;
vname = scan(table,-1);
resp = coalesce(of q:);
format resp f1.;
keep vname resp Percent;
run;
proc transpose prefix=R out=_data_(drop=_:);
by vname notsorted;
var percent;
id resp;
idlabel resp;
run;
proc print label;
run;
data temp;
input id q1 q2 q3;
cards;
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 1 2 3
6 2 3 4
;
run;
data _null_;
do i=1 to 3;
call execute(
'proc freq data=temp;' ||
' tables q' || strip(i) || '/out=percent' || strip(i) || '(keep=percent) noprint;' ||
'run;'
);
call execute(
'proc transpose data=percent' || strip(i) || '(rename=(percent=q' || strip(i) || '))' ||
' out=ptran' || strip(i) || '(drop=_label_ rename=(_name_=question)) prefix=pct;' ||
'run;'
);
end;
run;
data percent;
set ptran:;
run;
could also be done in a macro, but meh...
EDIT: Just had an additional thought to the above. You could significantly reduce the I/O requirements of the frequency step by replace the multiple calls of the frequency procedure by instead creating multiple tables statements instead... Gains would become noticeable with increasing quantities/variables.
Almost like _null_'s code:
data test; input q1-q48; cards; 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ; run; ods output onewayfreqs=freq; proc freq data=test ; tables q: /nofreq nocum nocol ; run; data temp(keep=tname level percent); set freq(drop=f_: ); tname=scan(table,2); level=coalesce(of q:); run; proc sort data=temp;by tname;run; proc transpose data=temp out=want(drop=_name_) prefix=x ; by tname; id level; var percent; run; data want; set want; flag=input(scan(tname,1,,'kd'),best8.); run; proc sort data=want;by flag;run;
Ksharp
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.