Hi I am trying to find out what is the best way to create a variable that I need and change the file format at the same time. My file looks like this (0 is not selected, 1 is selected). I need to create a variable called med that would have values 0 to 8 (depending on what was answered for med_1 to med_8). My issue is that if I use an array it will overwrite, as any ID (obs) can have multiple combinations of selected options (they can select all options or none and any combination in between). I could add all the possible combinations, however, my data file has other variables with 1-50 answer options that can be all selected, and I was hoping to do the same for those. I will not need the non selected (0). My goal is to have a final long data file that only contains the observations that selected at least one of the options. I have looked at wide to long articles and answers here, but I can't seem to adapt the information to what I need. Thank you.
OBS | med___1 | med___2 | med___3 | med___4 | med___5 | med___6 | med___7 | med___8 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
2 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
6 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
To update how I ended up solving this, after everyone's help. Thank you all again.
I used
proc transpose
data=have
out=long (
where=(col1 ne 0)
)
;
by obs;
var med:;
run;
data want;
set long;
med = scan(_name_,-1,"_");
drop _name_ col1;
run;
and replaced col1 ne 0 with col1 ne ' ' for the character variables and use RENAME col 1 in the data step for character variables instead of scan.
Data want2 (rename=(col1=mycharactervarname));
set long;
Run;
I transposed each variable separately and then merged them all. I have not done the date variables yet, but it should be straightforward. I used col2 to check for duplicates in my data file (before dropping it) and found out which ones they were and removed them from my have file (as when this happens, it indicates a data collection mistake in this case). I hope this can help others as well.
So what does your desired long data look like given this sample?
5 is not there because it only had 0's nothing selected. it would look like this.
OBS | Med |
1 | 8 |
2 | 1 |
2 | 3 |
3 | 1 |
4 | 1 |
4 | 2 |
4 | 3 |
6 | 1 |
6 | 2 |
6 | 3 |
6 | 4 |
6 | 5 |
Since you mention that you have a lot of other variables it would be a good idea to share how you expect the data to look including some of those others.
It may help to provide a worked example using a smaller set of values, say med_1 to med_4 and similar for at least one other variable if it is also a multiple response value like these Med values.
Yes, the others are also select not select, exactly like the med vars here. Thank you.
@Mscarboncopy wrote:
Yes, the others are also select not select, exactly like the med vars here. Thank you.
You need to show what the output involving these other variables would be. It seems pretty obvious that there should be another another variable but how to generate it is not obvious because what if the Obs=1 has 3 values for the other variable? You currently have exactly 1 value for Med but what happens with the other variable involved? This can get to be a very complex issue depending on what you think you want.
Oh, I see what you mean. I apologize.
So yes, it would have an issue right? Especially when I am adding variables that have 50+ options. So I added a var Living with 1-8 values just like Med. 0 for not selected, 1-8 to the selected ones.
OBS | Med | Living |
1 | 8 | 2 |
1 | 3 | |
2 | 1 | 1 |
2 | 3 | 2 |
2 | 3 | |
3 | 1 | 0 |
4 | 1 | 1 |
4 | 2 | 2 |
4 | 3 | 3 |
4 | 5 | |
6 | 1 | 1 |
6 | 2 | 2 |
6 | 3 | 0 |
6 | 4 | 0 |
6 | 5 | 0 |
Also I can output the other variables that involve more than 50 selection options as their own file. This would allow me to have a smaller (separate) file with the variables that are like med and living(shown here). But I still need to know how to create the new vars, changing the format and how to handle the missing shown. Thank you.
Transpose first, and then transform the result:
data have;
input OBS med___1 med___2 med___3 med___4 med___5 med___6 med___7 med___8;
datalines;
1 0 0 0 0 0 0 0 1
2 1 0 1 0 0 0 0 0
3 1 0 0 0 0 0 0 0
4 1 1 1 0 0 0 0 0
5 0 0 0 0 0 0 0 0
6 1 1 1 1 1 0 0 0
;
proc transpose
data=have
out=long (
where=(col1 ne 0)
)
;
by obs;
var med:;
run;
data want;
set long;
med = scan(_name_,-1,"_");
drop _name_ col1;
run;
This is it! Thank you so much.
Questions:
1. Can I add other variables to this? If I am adding other vars (A B C) would this be correct below.
2. what is the vat col2? My output is showing col2 and they are all missing except one obs has a 1.
proc transpose
data=have
out=long (
where=(col1 ne 0)
)
;
by obs;
var med: A: B: C:;
run;
data want;
set long;
med = scan(_name_,-1,"_");
A= scan(_name_,-1,"_");
B = scan(_name_,-1,"_");
C= scan(_name_,-1,"_");
drop _name_ col1;
run;
I see this does not work if I want to add more variables. It overwrites. This is the issue brought up before. So my option would be maybe proc transpose each variable and then merge the files ?
When I transpose each variable and merge I then have the issue that the values of the first var will repeat to fill in blanks and the same will keep happening when I add more variables. I am trying to remember how we do this, to stop filling in blanks. First obs? I am not sure. I would use something like that in my merge statement, right? The way I have it now you can see the last ID has the same value repeating because it has 2 values for the second var. This is what others were telling me. And I believe it is okay to leave the others as missing. But I need to find out how to prevent the repetition to fill in blanks.
obs | med | living |
1 | 7 | |
2 | 4 | |
3 | 8 | |
4 | 2 | 2 |
5 | 2 | |
6 | 3 | 1 |
6 | 3 | 8 |
To avoid the automatic RETAIN (the feature that makes one to many merges work right) you can use CALL MISSING().
data want;
merge one two three;
by id;
output;
call missing(of _all_);
run;
That is perfect. Thank you Tom. Can I just ignore the var COL2 in my output? I am planning on dropping it, but just wanted to make sure I do not need to use it to check for issues.
When you get a COL2 variable, this means that the BY variable is not unique, there are two observations for at least one OBS in your original dataset.
For a transpose like this to work consistently, you need to make sure first that your BY variable is unique.
Then, you do a double transpose with some little preparation in between:
data have;
input id $ med_1 med_2 liv_1 liv_2;
datalines;
A 0 1 1 0
B 1 1 0 1
C 1 0 0 1
D 1 1 1 1
;
proc transpose
data=have
out=long1 (where=(col1 ne 0))
;
by id;
var med: liv:;
run;
data long2;
set long1;
seq = input(scan(_name_,2,"_"),best.);
_name_ = scan(_name_,1,"_");
run;
proc sort data=long2;
by id seq;
run;
proc transpose
data=long2
out=want (drop=seq _name_)
;
by id seq;
var seq;
id _name_;
run;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: