I am having trouble when dealing with multiple column:
column A B C D and each column have value of either 0 or 1. I want to make a new variable to be able to use the PROC FREQ function for the new variable so that I can know the frequency of each individual column. Any thoughts please ??
Does the following accomplish what you want/need?:
data have;
input ID $ TestA TestB TestC TestD TestE;
cards;
AB348 0 0 1 0 1
SB876 0 1 1 0 0
VB912 0 0 0 0 0
SK623 1 0 0 1 0
;
data need (keep=id test);
set have;
length test $5;
array _test TestA--TestE;
do over _test;
if _test eq 1 then do;
test=vname(_test);
output;
end;
end;
run;
proc freq data=need;
table test;
run;
Could you clarify exactly what you want please. My reading of the question is that you want to create a new variable that sums across the values of variables A - D so that you can perform a proc freq on that new variable, is that correct?
Thanks for your reply.
This is what I am looking for:
AB348 0 0 0 0 1 0
SB876 0 1 0 0 0 0
VB912 0 0 1 0 0
So, I want to name a new variable which consists of the last five variable in the above data set. Then, I want to do the PROC FREQ on the new variable so that I can get the individual freq of each of the five variables.
I hope I am clear this time.
Thanks.
No, I think you have to be clearer. In addition to the fact that your example has 2 rows with 7 variables each, and one with 6, what do you want the resulting dataset to look like (please show as a datastep) and what do you want the result of proc freq to look like?
I apologize for being not clear.
This is the dataset:
ID $ TestA TestB TestC TestD TestE
AB348 0 0 0 0 1
SB876 0 1 0 0 0
VB912 0 0 1 0 0
SK623 1 0 0 0 0
and so on--------
I want a new variable "Test" on which I should be able to perform the PROC FREQ:
and the output should look like
Test Frequency Percent Cumulative_Frequency Cumulative Percent
-----------------------------------------------------------------------------------------------------
TestA
TestB
TestC
TestD
TestE
Hopefully, I am clear now.
Thanks for helping me out.
Still not quite clear enough. Can a record be assigned to more than one test or are they mutually exclusive? And, will there ever be a case where all of the tests have a value of 0?
Thanks Arthur for your patience.
Yes, it can be assigned to more than one test.
for eg:
for one of the IDs it can be:
MJ376 0 1 1 0 1
Also, in some cases all the values can be zero.
Does the following accomplish what you want/need?:
data have;
input ID $ TestA TestB TestC TestD TestE;
cards;
AB348 0 0 1 0 1
SB876 0 1 1 0 0
VB912 0 0 0 0 0
SK623 1 0 0 1 0
;
data need (keep=id test);
set have;
length test $5;
array _test TestA--TestE;
do over _test;
if _test eq 1 then do;
test=vname(_test);
output;
end;
end;
run;
proc freq data=need;
table test;
run;
WOW...Thanks a lot, Arthur. That was really helpful.
I just have another quick question. How can I combine TestD and TestE and make a column name "Test_Var" and then I need to do the same thing with
the data variables as:
ID $ TestA TestB TestC Test_Var
Thank you very much.
One way would be to just extend the same logic. E.g.:
data have;
input ID $ TestA TestB TestC TestD TestE;
cards;
AB348 0 0 1 0 1
SB876 0 1 1 0 0
VB912 0 0 0 0 0
SK623 1 0 0 1 0
;
data need (keep=id test);
set have;
length test $8;
array _test TestA--TestE;
do over _test;
if _test eq 1 then do;
if vname(_test) in ('TestD','TestE') then test='Test_Var';
else test=vname(_test);
output;
end;
end;
run;
proc freq data=need;
table test;
run;
That certainly works. Thank you very much for your help.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.