Solved: Re: Looping through columns in groups of 4, combining columns and reas...

ngnikhilgoyal · Posted 04-17-2015 03:47 PM

Hi,

I have some data which looks like below. The columns are actually in groups of 4 which just represent a 4 point scale. i.e. columns a, b, c & d represent the same variable where a = 4, b = 3, c = 2 and d = 1. Similarly e = 4, f = 3, g = 2 and h = 1.

The value 1 in 'a' means , 'a' has been selected by respondent and the value to be assigned for columns a, b, c and d for record 1 is 4 on the 4-point scale. The value 1 in column g means for the group of e, f, g & h, g has been chosen as the correct option by respondent and a 2 should be assigned for variable represented by group e, f, g & h.

HAVE:

a	b	c	d	e	f	g	h
1	0	0	0	0	0	1	0
0	0	1	0	1	0	0	0
0	0	0	1	0	1	0	0
0	1	0	0	0	0	0	1

WANT:

VAR1 (a, b, c, d)	VAR2 (e, f, g, h)
4	2
2	4
1	3
3	1

How can i achieve this?

ballardw · Posted 04-20-2015 11:38 AM

There are a number of approaches available depending upon your level of coding comfort.

The easiest might be an array but the variables would need to be presented in the correct order for an array statement:

array group a b c d <etc>

The trickier part is creating the names for the NEW variables.

You can get a list of all of the variable names and count of variables into a macro variable by:

proc sql noprint;

select name, count(*)/4 into : varlist separated by ' ' : varcount

from dictionary.columns where libname='LIB'

and memname='DATA';

quit;

Put the name of your library and dataset in place of LIB and DATA IN CAPITAL LETTERS;

You can use %put to see the value in the log. If only variables you want appear, and are in the correct order in the Varlist variable you can use &varlist.

data want;

set have;

array group &varlist;

array newvar Var1 - Var&varcount;

do I = 1 to &varcount;

base = (I-1)*4;

newvar = input(cats(group[base+1],group[base+2],group[base+3],group[base+4]),groupfour.);

end;

drop I base;

View solution in original post

Reeza · Posted 04-17-2015 04:44 PM

ballardw · Posted 04-17-2015 04:49 PM

Here are two ways, one an assignment approach and the other with and informat. If you have lots of these variables I would recommend the format approach. Or go back to the survey software that created these and look at the export options.

proc format;

invalue groupfour

'1000'=4

'0100'=3

'0010'=2

'0001'=1

;

run;

data work.test;

input a b c d e f g h ;

var1 = a*4 + b*3 + c*2 + d;

var2 = input(cats(e,f,g,h),groupfour.);

datalines;

1 0 0 0 0 0 1 0

0 0 1 0 1 0 0 0

0 0 0 1 0 1 0 0

0 1 0 0 0 0 0 1

;

run;

ngnikhilgoyal · Posted 04-20-2015 11:15 AM

Hi BallardW,

the format approach is really cool . I have one question through - my original dataset has a lot of columns. Eventually the column names run into AB, AC , AD up to FV.

I don't want to type:

input(cats(e,f,g,h), groupfour.);

input(cats(i,j,k,l), groupfour.);

Isn't there someway I can ask SAS to apply groupfour. to successive 4 columns and keep creating new variable names?

thanks

Ksharp · Posted 04-18-2015 03:50 AM



data work.test;
input a b c d e f g h ;
array x{*} d c b a;
array y{*} h g f e;
v1=whichn(1,of x{*});
v2=whichn(1,of y{*});
datalines;
1 0 0 0 0 0 1 0
0 0 1 0 1 0 0 0
0 0 0 1 0 1 0 0
0 1 0 0 0 0 0 1
;
run;

Xia Keshan

ngnikhilgoyal · Posted 04-20-2015 11:21 AM

Thanks Xia, very good approach - I didn;t know the whichn function.

Is there a way we can save the keystrokes when creating array ? There are many columns in my dataset - Can we code SAS to go through 4 columns and make arrays out of them?

ballardw · Posted 04-20-2015 11:38 AM

There are a number of approaches available depending upon your level of coding comfort.

The easiest might be an array but the variables would need to be presented in the correct order for an array statement:

array group a b c d <etc>

The trickier part is creating the names for the NEW variables.

You can get a list of all of the variable names and count of variables into a macro variable by:

proc sql noprint;

select name, count(*)/4 into : varlist separated by ' ' : varcount

from dictionary.columns where libname='LIB'

and memname='DATA';

quit;

Put the name of your library and dataset in place of LIB and DATA IN CAPITAL LETTERS;

You can use %put to see the value in the log. If only variables you want appear, and are in the correct order in the Varlist variable you can use &varlist.

data want;

set have;

array group &varlist;

array newvar Var1 - Var&varcount;

do I = 1 to &varcount;

base = (I-1)*4;

newvar = input(cats(group[base+1],group[base+2],group[base+3],group[base+4]),groupfour.);

end;

drop I base;

ngnikhilgoyal · Posted 04-20-2015 12:36 PM

I get this error:

NOTE: Line generated by the macro variable "VARCOUNT".

1 Var 2

-

22

200

ERROR: Missing numeric suffix on a numbered variable list (Var1-Var).

WARNING: Defining an array with zero elements.

Though the macro variable &varcount is recognized, a numeric suffix is required it says.

ballardw · Posted 04-20-2015 02:06 PM

I missed a comma retyping into the forum:

varlist separated by ' ' , : varcount

ngnikhilgoyal · Posted 04-20-2015 02:10 PM

Oh no I corrected that. The error is occurring in the next block -

data want;

set test;

array group &varlist;

array newvar Var1 - Var&varcount;

do I = 1 to &varcount;

base = (I-1)*4;

newvar = input(cats(group[base+1],group[base+2],group[base+3],group[base+4]),groupfour.);

end;

drop I base;

OTE: Line generated by the macro variable "VARCOUNT".

1 Var 2

-

22

200

ERROR: Missing numeric suffix on a numbered variable list (Var1-Var).

WARNING: Defining an array with zero elements.

ERROR 22-322: Syntax error, expecting one of the following: a name, (, ;, _ALL_, _CHARACTER_,

_CHAR_, _NUMERIC_.

ERROR 200-322: The symbol is not recognized and will be ignored.

ballardw · Posted 04-20-2015 04:20 PM

Before the data step after creating varcount:

%let varcount=%left(&varcount);

The SQL is placing leading spaces.

ngnikhilgoyal · Posted 04-20-2015 01:06 PM

Trying another approach:

data test;

input a b c d e f g h i j k l;

datalines;

1 0 0 0 0 1 0 0 1 0 0 0

0 1 0 0 0 0 1 0 0 1 0 0

0 0 1 0 1 0 0 0 1 0 0 0

0 0 0 1 0 0 0 1 0 0 1 0

;

run;

proc transpose data = test out = test_t;

run;

data test_t;

set test_t;

retain block;

if _n_ = 1 then block = 1;

if mod(_n_/4,1) = 0.25 and _n_ gt 1 then block +1;

run;

This gives me:

Name	Col1	Col2	Col3	Col4	block
a	1	0	0	0	1
b	0	1	0	0	1
c	0	0	1	0	1
d	0	0	0	1	1
e	0	0	1	0	2
f	1	0	0	0	2
g	0	1	0	0	2
h	0	0	0	1	2
i	1	0	1	0	3
j	0	1	0	0	3
k	0	0	0	1	3
l	0	0	0	0	3

Is there a way I can group this by "block" to get something like :

block	col1	col2	col3	col4
1	1000	0100	0010	0001
2	0100	0010	1000	0001
3	1000	0100	1000	0010

thanks

Tom · Posted 04-20-2015 05:13 PM

You can use the WHICHN() function to figure our which one of the four has a 1 in it. But that assumes clean data with only one 1 in any group (or if more than 1 is set then that the highest one "wins").

data have;

input a b c d e f g h i j k l m n o p q r s t ;

cards;

1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0

;;;;

data want;

set have ;

array x a -- t ;

array value (5) ;

do row=1 to dim(x)/4 ;

value(row)=whichn(1,x(4*(row-1)+4),x(4*(row-1)+3),x(4*(row-1)+2),x(4*(row-1)+1)) ;

end;

put (x(*)) (2. 2. 2. 2. +2)

/ (value(*)) (2. +8)

;

run;

1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0

4 3 2 1 0

Looping through columns in groups of 4, combining columns and reassigning values

Re: Looping through columns in groups of 4, combining columns and reassigning values

Re: Looping through columns in groups of 4, combining columns and reassigning values

Re: Looping through columns in groups of 4, combining columns and reassigning values

Re: Looping through columns in groups of 4, combining columns and reassigning values

Re: Looping through columns in groups of 4, combining columns and reassigning values

Re: Looping through columns in groups of 4, combining columns and reassigning values

Re: Looping through columns in groups of 4, combining columns and reassigning values

Re: Looping through columns in groups of 4, combining columns and reassigning values

Re: Looping through columns in groups of 4, combining columns and reassigning values

Re: Looping through columns in groups of 4, combining columns and reassigning values

Re: Looping through columns in groups of 4, combining columns and reassigning values

Re: Looping through columns in groups of 4, combining columns and reassigning values

Re: Looping through columns in groups of 4, combining columns and reassigning values

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away