Hi,
I have some data which looks like below. The columns are actually in groups of 4 which just represent a 4 point scale. i.e. columns a, b, c & d represent the same variable where a = 4, b = 3, c = 2 and d = 1. Similarly e = 4, f = 3, g = 2 and h = 1.
The value 1 in 'a' means , 'a' has been selected by respondent and the value to be assigned for columns a, b, c and d for record 1 is 4 on the 4-point scale. The value 1 in column g means for the group of e, f, g & h, g has been chosen as the correct option by respondent and a 2 should be assigned for variable represented by group e, f, g & h.
HAVE:
a | b | c | d | e | f | g | h |
---|---|---|---|---|---|---|---|
1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 |
0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
WANT:
VAR1 (a, b, c, d) | VAR2 (e, f, g, h) |
---|---|
4 | 2 |
2 | 4 |
1 | 3 |
3 | 1 |
How can i achieve this?
There are a number of approaches available depending upon your level of coding comfort.
The easiest might be an array but the variables would need to be presented in the correct order for an array statement:
array group a b c d <etc>
The trickier part is creating the names for the NEW variables.
You can get a list of all of the variable names and count of variables into a macro variable by:
proc sql noprint;
select name, count(*)/4 into : varlist separated by ' ' : varcount
from dictionary.columns where libname='LIB'
and memname='DATA';
quit;
Put the name of your library and dataset in place of LIB and DATA IN CAPITAL LETTERS;
You can use %put to see the value in the log. If only variables you want appear, and are in the correct order in the Varlist variable you can use &varlist.
data want;
set have;
array group &varlist;
array newvar Var1 - Var&varcount;
do I = 1 to &varcount;
base = (I-1)*4;
newvar = input(cats(group[base+1],group[base+2],group[base+3],group[base+4]),groupfour.);
end;
drop I base;
Here are two ways, one an assignment approach and the other with and informat. If you have lots of these variables I would recommend the format approach. Or go back to the survey software that created these and look at the export options.
proc format;
invalue groupfour
'1000'=4
'0100'=3
'0010'=2
'0001'=1
;
run;
data work.test;
input a b c d e f g h ;
var1 = a*4 + b*3 + c*2 + d;
var2 = input(cats(e,f,g,h),groupfour.);
datalines;
1 0 0 0 0 0 1 0
0 0 1 0 1 0 0 0
0 0 0 1 0 1 0 0
0 1 0 0 0 0 0 1
;
run;
Hi BallardW,
the format approach is really cool . I have one question through - my original dataset has a lot of columns. Eventually the column names run into AB, AC , AD up to FV.
I don't want to type:
input(cats(e,f,g,h), groupfour.);
input(cats(i,j,k,l), groupfour.);
Isn't there someway I can ask SAS to apply groupfour. to successive 4 columns and keep creating new variable names?
thanks
data work.test; input a b c d e f g h ; array x{*} d c b a; array y{*} h g f e; v1=whichn(1,of x{*}); v2=whichn(1,of y{*}); datalines; 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1 ; run;
Xia Keshan
Thanks Xia, very good approach - I didn;t know the whichn function.
Is there a way we can save the keystrokes when creating array ? There are many columns in my dataset - Can we code SAS to go through 4 columns and make arrays out of them?
There are a number of approaches available depending upon your level of coding comfort.
The easiest might be an array but the variables would need to be presented in the correct order for an array statement:
array group a b c d <etc>
The trickier part is creating the names for the NEW variables.
You can get a list of all of the variable names and count of variables into a macro variable by:
proc sql noprint;
select name, count(*)/4 into : varlist separated by ' ' : varcount
from dictionary.columns where libname='LIB'
and memname='DATA';
quit;
Put the name of your library and dataset in place of LIB and DATA IN CAPITAL LETTERS;
You can use %put to see the value in the log. If only variables you want appear, and are in the correct order in the Varlist variable you can use &varlist.
data want;
set have;
array group &varlist;
array newvar Var1 - Var&varcount;
do I = 1 to &varcount;
base = (I-1)*4;
newvar = input(cats(group[base+1],group[base+2],group[base+3],group[base+4]),groupfour.);
end;
drop I base;
I get this error:
NOTE: Line generated by the macro variable "VARCOUNT".
1 Var 2
-
22
200
ERROR: Missing numeric suffix on a numbered variable list (Var1-Var).
WARNING: Defining an array with zero elements.
Though the macro variable &varcount is recognized, a numeric suffix is required it says.
I missed a comma retyping into the forum:
varlist separated by ' ' , : varcount
Oh no I corrected that. The error is occurring in the next block -
data want;
set test;
array group &varlist;
array newvar Var1 - Var&varcount;
do I = 1 to &varcount;
base = (I-1)*4;
newvar = input(cats(group[base+1],group[base+2],group[base+3],group[base+4]),groupfour.);
end;
drop I base;
OTE: Line generated by the macro variable "VARCOUNT".
1 Var 2
-
22
200
ERROR: Missing numeric suffix on a numbered variable list (Var1-Var).
WARNING: Defining an array with zero elements.
ERROR 22-322: Syntax error, expecting one of the following: a name, (, ;, _ALL_, _CHARACTER_,
_CHAR_, _NUMERIC_.
ERROR 200-322: The symbol is not recognized and will be ignored.
Before the data step after creating varcount:
%let varcount=%left(&varcount);
The SQL is placing leading spaces.
Trying another approach:
data test;
input a b c d e f g h i j k l;
datalines;
1 0 0 0 0 1 0 0 1 0 0 0
0 1 0 0 0 0 1 0 0 1 0 0
0 0 1 0 1 0 0 0 1 0 0 0
0 0 0 1 0 0 0 1 0 0 1 0
;
run;
proc transpose data = test out = test_t;
run;
data test_t;
set test_t;
retain block;
if _n_ = 1 then block = 1;
if mod(_n_/4,1) = 0.25 and _n_ gt 1 then block +1;
run;
This gives me:
Name | Col1 | Col2 | Col3 | Col4 | block |
a | 1 | 0 | 0 | 0 | 1 |
b | 0 | 1 | 0 | 0 | 1 |
c | 0 | 0 | 1 | 0 | 1 |
d | 0 | 0 | 0 | 1 | 1 |
e | 0 | 0 | 1 | 0 | 2 |
f | 1 | 0 | 0 | 0 | 2 |
g | 0 | 1 | 0 | 0 | 2 |
h | 0 | 0 | 0 | 1 | 2 |
i | 1 | 0 | 1 | 0 | 3 |
j | 0 | 1 | 0 | 0 | 3 |
k | 0 | 0 | 0 | 1 | 3 |
l | 0 | 0 | 0 | 0 | 3 |
Is there a way I can group this by "block" to get something like :
block | col1 | col2 | col3 | col4 |
1 | 1000 | 0100 | 0010 | 0001 |
2 | 0100 | 0010 | 1000 | 0001 |
3 | 1000 | 0100 | 1000 | 0010 |
thanks
You can use the WHICHN() function to figure our which one of the four has a 1 in it. But that assumes clean data with only one 1 in any group (or if more than 1 is set then that the highest one "wins").
data have;
input a b c d e f g h i j k l m n o p q r s t ;
cards;
1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0
;;;;
data want;
set have ;
array x a -- t ;
array value (5) ;
do row=1 to dim(x)/4 ;
value(row)=whichn(1,x(4*(row-1)+4),x(4*(row-1)+3),x(4*(row-1)+2),x(4*(row-1)+1)) ;
end;
put (x(*)) (2. 2. 2. 2. +2)
/ (value(*)) (2. +8)
;
run;
1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0
4 3 2 1 0
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.