BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mscarboncopy
Pyrite | Level 9

I have 60 variables (selected = 1, not selected = 0 VAR1 to VAR60) that I wanted to collapse into one variable with 60 keys (MAINVAR).

However, each one of my ids, could have more than 1 selected.

If I did a simple if then, it overwrites.

What is the most effective way of doing this?

It would then become a stacked file. Each id that has more than one variable selected would have multiple entries.

To make it a little bit more complex, for each selected there is a corresponding date var and also corresponding timing variable (past present selected not selected). However, I could do the same thing for the timing and the date variables, once I know how to do this (I think).

Example of what the datafile looks like now (leaving dates and timing out):

ID   VAR1 VAR2 VAR3 VAR4 VAR5 VAR6   etc…

A     1          0         1         1         0        0
B     0          1         0         0         0        0

C     1          1         1         1         0        0

D     0          0         0         1          1       1

 

Example of how it would be from the example above:

 

ID  MAINVAR (keys 1-60)

A      1

A      3

A      4

B      2

C      1

C      2

C      3

C      4

D      4

D      5

D      6

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

@mkeintz wrote:

This has DATA step with a do loop written all over it:

 

You need to 

  1. iterate (using a do loop) over all the VARs  (use an array)
  2. If the array element is 1 then
    1. assign the element-number within the array to MAINVAR
    2. output the result
data want (keep=id mainvar);
  set have;
  array _var {*} var: ;
  do mainvar=1 to dim(_var);
    if _var{mainvar}>0 then output;
  end;
run;

This relies entirely on the assumption that variable names range from VAR1 through VARxx, (xx=60 in your example) with no missing names in that list.


I don't think missing variable names in the VAR1 through VARxx is a problem in your code. I think actually it would be a problem in my code, but that's easily fixed.

--
Paige Miller

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26
data have;
Input id $ VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 ;
cards;
A     1          0         1         1         0        0
B     0          1         0         0         0        0
C     1          1         1         1         0        0
D     0          0         0         1          1       1
;

data want;
    set have;
    array v var1-var6;
    do mainvar=1 to dim(v);
        if v(mainvar)=1 then output;
    end;
    keep id mainvar;
run;
--
Paige Miller
Mscarboncopy
Pyrite | Level 9

This worked really well. Thank you.

Is there a way to do this for data variables? The main variable in this case would be dates instead of 1 selected.

PaigeMiller
Diamond | Level 26

@Mscarboncopy wrote:

This worked really well. Thank you.

Is there a way to do this for data variables? The main variable in this case would be dates instead of 1 selected.


I think this needs clarification and an example.

--
Paige Miller
mkeintz
PROC Star

This has DATA step with a do loop written all over it:

 

You need to 

  1. iterate (using a do loop) over all the VARs  (use an array)
  2. If the array element is 1 then
    1. assign the element-number within the array to MAINVAR
    2. output the result
data want (keep=id mainvar);
  set have;
  array _var {*} var: ;
  do mainvar=1 to dim(_var);
    if _var{mainvar}>0 then output;
  end;
run;

This relies entirely on the assumption that variable names range from VAR1 through VARxx, (xx=60 in your example) with no missing names in that list.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PaigeMiller
Diamond | Level 26

@mkeintz wrote:

This has DATA step with a do loop written all over it:

 

You need to 

  1. iterate (using a do loop) over all the VARs  (use an array)
  2. If the array element is 1 then
    1. assign the element-number within the array to MAINVAR
    2. output the result
data want (keep=id mainvar);
  set have;
  array _var {*} var: ;
  do mainvar=1 to dim(_var);
    if _var{mainvar}>0 then output;
  end;
run;

This relies entirely on the assumption that variable names range from VAR1 through VARxx, (xx=60 in your example) with no missing names in that list.


I don't think missing variable names in the VAR1 through VARxx is a problem in your code. I think actually it would be a problem in my code, but that's easily fixed.

--
Paige Miller
mkeintz
PROC Star

@PaigeMiller wrote:

@mkeintz wrote:

This has DATA step with a do loop written all over it:

 

You need to 

  1. iterate (using a do loop) over all the VARs  (use an array)
  2. If the array element is 1 then
    1. assign the element-number within the array to MAINVAR
    2. output the result
data want (keep=id mainvar);
  set have;
  array _var {*} var: ;
  do mainvar=1 to dim(_var);
    if _var{mainvar}>0 then output;
  end;
run;

This relies entirely on the assumption that variable names range from VAR1 through VARxx, (xx=60 in your example) with no missing names in that list.


I don't think missing variable names in the VAR1 through VARxx is a problem in your code. I think actually it would be a problem in my code, but that's easily fixed.


What I was trying to warn against was starting out with, say  VAR1 VAR2 VAR5 VAR6.   They would form a 4-element array, with MAINVAR=3 for VAR5, and MAINVAR=4 for VAR6.   Of course, I could avoid that problem by changing

array _var {*} var: ;

to

array _var {*} var1-var60 ;

which is what I should have done to begin with.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Mscarboncopy
Pyrite | Level 9

I don't have missing.

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 975 views
  • 3 likes
  • 4 in conversation