I have 60 variables (selected = 1, not selected = 0 VAR1 to VAR60) that I wanted to collapse into one variable with 60 keys (MAINVAR).
However, each one of my ids, could have more than 1 selected.
If I did a simple if then, it overwrites.
What is the most effective way of doing this?
It would then become a stacked file. Each id that has more than one variable selected would have multiple entries.
To make it a little bit more complex, for each selected there is a corresponding date var and also corresponding timing variable (past present selected not selected). However, I could do the same thing for the timing and the date variables, once I know how to do this (I think).
Example of what the datafile looks like now (leaving dates and timing out):
ID VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 etc…
A 1 0 1 1 0 0
B 0 1 0 0 0 0
C 1 1 1 1 0 0
D 0 0 0 1 1 1
Example of how it would be from the example above:
ID MAINVAR (keys 1-60)
A 1
A 3
A 4
B 2
C 1
C 2
C 3
C 4
D 4
D 5
D 6
Thank you!
@mkeintz wrote:
This has DATA step with a do loop written all over it:
You need to
- iterate (using a do loop) over all the VARs (use an array)
- If the array element is 1 then
- assign the element-number within the array to MAINVAR
- output the result
data want (keep=id mainvar); set have; array _var {*} var: ; do mainvar=1 to dim(_var); if _var{mainvar}>0 then output; end; run;
This relies entirely on the assumption that variable names range from VAR1 through VARxx, (xx=60 in your example) with no missing names in that list.
I don't think missing variable names in the VAR1 through VARxx is a problem in your code. I think actually it would be a problem in my code, but that's easily fixed.
data have;
Input id $ VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 ;
cards;
A 1 0 1 1 0 0
B 0 1 0 0 0 0
C 1 1 1 1 0 0
D 0 0 0 1 1 1
;
data want;
set have;
array v var1-var6;
do mainvar=1 to dim(v);
if v(mainvar)=1 then output;
end;
keep id mainvar;
run;
This worked really well. Thank you.
Is there a way to do this for data variables? The main variable in this case would be dates instead of 1 selected.
@Mscarboncopy wrote:
This worked really well. Thank you.
Is there a way to do this for data variables? The main variable in this case would be dates instead of 1 selected.
I think this needs clarification and an example.
This has DATA step with a do loop written all over it:
You need to
data want (keep=id mainvar);
set have;
array _var {*} var: ;
do mainvar=1 to dim(_var);
if _var{mainvar}>0 then output;
end;
run;
This relies entirely on the assumption that variable names range from VAR1 through VARxx, (xx=60 in your example) with no missing names in that list.
@mkeintz wrote:
This has DATA step with a do loop written all over it:
You need to
- iterate (using a do loop) over all the VARs (use an array)
- If the array element is 1 then
- assign the element-number within the array to MAINVAR
- output the result
data want (keep=id mainvar); set have; array _var {*} var: ; do mainvar=1 to dim(_var); if _var{mainvar}>0 then output; end; run;
This relies entirely on the assumption that variable names range from VAR1 through VARxx, (xx=60 in your example) with no missing names in that list.
I don't think missing variable names in the VAR1 through VARxx is a problem in your code. I think actually it would be a problem in my code, but that's easily fixed.
@PaigeMiller wrote:
@mkeintz wrote:
This has DATA step with a do loop written all over it:
You need to
- iterate (using a do loop) over all the VARs (use an array)
- If the array element is 1 then
- assign the element-number within the array to MAINVAR
- output the result
data want (keep=id mainvar); set have; array _var {*} var: ; do mainvar=1 to dim(_var); if _var{mainvar}>0 then output; end; run;
This relies entirely on the assumption that variable names range from VAR1 through VARxx, (xx=60 in your example) with no missing names in that list.
I don't think missing variable names in the VAR1 through VARxx is a problem in your code. I think actually it would be a problem in my code, but that's easily fixed.
What I was trying to warn against was starting out with, say VAR1 VAR2 VAR5 VAR6. They would form a 4-element array, with MAINVAR=3 for VAR5, and MAINVAR=4 for VAR6. Of course, I could avoid that problem by changing
array _var {*} var: ;
to
array _var {*} var1-var60 ;
which is what I should have done to begin with.
I don't have missing.
Here's some references you may find useful.
Transposing data tutorials:
Wide to Long:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-wide-to-long-using-proc-transpose/
https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/
And sometimes a double transpose is needed for extra wide data sets:
https://gist.github.com/statgeek/2321b6f62ab78d5bf2b0a5a8626bd7cd
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.