BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ngnikhilgoyal
Calcite | Level 5

Hi,

I have some data which looks like below. The columns are actually in groups of 4 which just represent a 4 point scale. i.e. columns a, b, c & d represent the same variable where a = 4, b = 3, c = 2 and d = 1. Similarly e = 4, f = 3, g = 2 and h = 1.

The value 1 in 'a' means , 'a' has been selected by respondent and the value to be assigned for columns a, b, c and d for record 1 is 4 on the 4-point scale. The value 1 in column g means for the group of e, f, g & h, g has been chosen as the correct option by respondent and a 2 should be assigned for variable represented by group e, f, g & h.

HAVE:

abcdefgh
10000010
00101000
00010100
01000001

WANT:

VAR1 (a, b, c, d)VAR2 (e, f, g, h)
42
24
13
31

How can i achieve this?

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

There are a number of approaches available depending upon your level of coding comfort.

The easiest might be an array but the variables would need to be presented in the correct order for an array statement:

array group a b c d <etc>

The trickier part is creating the names for the NEW variables.

You can get a list of all of the variable names  and count of variables into a macro variable by:

proc sql noprint;

     select name, count(*)/4 into : varlist separated by ' ' : varcount

     from dictionary.columns where libname='LIB'

          and memname='DATA';

quit;

Put the name of your library and dataset in place of LIB and DATA IN CAPITAL LETTERS;

You can use %put to see the value in the log. If only variables you want appear, and are in the correct order in the Varlist variable you can use &varlist.

data want;

set have;

array group &varlist;

array newvar  Var1 - Var&varcount;

do I = 1 to &varcount;

     base = (I-1)*4;

     newvar =  input(cats(group[base+1],group[base+2],group[base+3],group[base+4]),groupfour.);

end;

drop I base;

View solution in original post

12 REPLIES 12
ballardw
Super User

Here are two ways, one an assignment approach and the other with and informat. If you have lots of these variables I would recommend the format approach. Or go back to the survey software that created these and look at the export options.

proc format;

invalue groupfour

'1000'=4

'0100'=3

'0010'=2

'0001'=1

;

run;

data work.test;

input a b c d e f g h ;

var1 = a*4 + b*3 + c*2 + d;

var2 = input(cats(e,f,g,h),groupfour.);

datalines;

1 0 0 0 0 0 1 0

0 0 1 0 1 0 0 0

0 0 0 1 0 1 0 0

0 1 0 0 0 0 0 1

;

run;

ngnikhilgoyal
Calcite | Level 5

Hi BallardW,

the format approach is really cool . I have one question through - my original dataset has a lot of columns. Eventually the column names run into AB, AC , AD up to FV.

I don't want to type:

input(cats(e,f,g,h), groupfour.);

input(cats(i,j,k,l), groupfour.);

Isn't there someway I can ask SAS to apply groupfour. to successive 4 columns and keep creating new variable names?

thanks

Ksharp
Super User


data work.test;
input a b c d e f g h ;
array x{*} d c b a;
array y{*} h g f e;
v1=whichn(1,of x{*});
v2=whichn(1,of y{*});
datalines;
1 0 0 0 0 0 1 0
0 0 1 0 1 0 0 0
0 0 0 1 0 1 0 0
0 1 0 0 0 0 0 1
;
run;

Xia Keshan

ngnikhilgoyal
Calcite | Level 5

Thanks Xia, very good approach - I didn;t know the whichn function.

Is there a way we can save the keystrokes when creating array ? There are many columns in my dataset - Can we code SAS to go through 4 columns and make arrays out of them?

ballardw
Super User

There are a number of approaches available depending upon your level of coding comfort.

The easiest might be an array but the variables would need to be presented in the correct order for an array statement:

array group a b c d <etc>

The trickier part is creating the names for the NEW variables.

You can get a list of all of the variable names  and count of variables into a macro variable by:

proc sql noprint;

     select name, count(*)/4 into : varlist separated by ' ' : varcount

     from dictionary.columns where libname='LIB'

          and memname='DATA';

quit;

Put the name of your library and dataset in place of LIB and DATA IN CAPITAL LETTERS;

You can use %put to see the value in the log. If only variables you want appear, and are in the correct order in the Varlist variable you can use &varlist.

data want;

set have;

array group &varlist;

array newvar  Var1 - Var&varcount;

do I = 1 to &varcount;

     base = (I-1)*4;

     newvar =  input(cats(group[base+1],group[base+2],group[base+3],group[base+4]),groupfour.);

end;

drop I base;

ngnikhilgoyal
Calcite | Level 5

I get this error:

NOTE: Line generated by the macro variable "VARCOUNT".

1     Var       2

                -

                22

                200

ERROR: Missing numeric suffix on a numbered variable list (Var1-Var).

WARNING: Defining an array with zero elements.

Though the macro variable &varcount is recognized, a numeric suffix is required it says.

ballardw
Super User

I missed a comma retyping into the forum:

varlist separated by ' ' , : varcount

ngnikhilgoyal
Calcite | Level 5

Oh no I corrected that. The error is occurring in the next block -

data want;

set test;

array group &varlist;

array newvar  Var1 - Var&varcount;

do I = 1 to &varcount;

     base = (I-1)*4;

     newvar =  input(cats(group[base+1],group[base+2],group[base+3],group[base+4]),groupfour.);

end;

drop I base;

OTE: Line generated by the macro variable "VARCOUNT".

1     Var       2

                -

                22

                200

ERROR: Missing numeric suffix on a numbered variable list (Var1-Var).

WARNING: Defining an array with zero elements.

ERROR 22-322: Syntax error, expecting one of the following: a name, (, ;, _ALL_, _CHARACTER_,

              _CHAR_, _NUMERIC_.

ERROR 200-322: The symbol is not recognized and will be ignored.

ballardw
Super User

Before the data step after creating varcount:

%let varcount=%left(&varcount);

The SQL is placing leading spaces.

ngnikhilgoyal
Calcite | Level 5

Trying another approach:

data test;

   input a b c d e f g h i j k l;

   datalines;

1 0 0 0 0 1 0 0 1 0 0 0

0 1 0 0 0 0 1 0 0 1 0 0

0 0 1 0 1 0 0 0 1 0 0 0

0 0 0 1 0 0 0 1 0 0 1 0

;

run;

proc transpose data = test out = test_t;

run;

data test_t;

set test_t;

retain block;

if _n_ = 1 then block = 1;

if mod(_n_/4,1) = 0.25 and _n_ gt 1 then block +1;

run;

This gives me:

NameCol1Col2Col3Col4block
a10001
b01001
c00101
d00011
e00102
f10002
g01002
h00012
i10103
j01003
k00013
l00003

Is there a way I can group this by "block" to get something like :

blockcol1col2col3col4
11000010000100001
20100001010000001
31000010010000010

thanks

Tom
Super User Tom
Super User

You can use the WHICHN() function to figure our which one of the four has a 1 in it.  But that assumes clean data with only one 1 in any group (or if more than 1 is set then that the highest one "wins").

data have;

  input a b c d e f g h i j k l m n o p q r s t ;

cards;

1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0

;;;;

data want;

  set have ;

  array x a -- t ;

  array value (5) ;

  do row=1 to dim(x)/4 ;

   value(row)=whichn(1,x(4*(row-1)+4),x(4*(row-1)+3),x(4*(row-1)+2),x(4*(row-1)+1)) ;

  end;

  put (x(*)) (2. 2. 2. 2. +2)

    / (value(*)) (2. +8)

  ;

run;

1 0 0 0   0 1 0 0   0 0 1 0   0 0 0 1   0 0 0 0

4         3         2         1         0


hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 3687 views
  • 7 likes
  • 5 in conversation