Solved: Re: creating a new variable and changing wide to long format

Mscarboncopy · Posted 01-25-2021 02:18 PM

Hi I am trying to find out what is the best way to create a variable that I need and change the file format at the same time. My file looks like this (0 is not selected, 1 is selected). I need to create a variable called med that would have values 0 to 8 (depending on what was answered for med_1 to med_8). My issue is that if I use an array it will overwrite, as any ID (obs) can have multiple combinations of selected options (they can select all options or none and any combination in between). I could add all the possible combinations, however, my data file has other variables with 1-50 answer options that can be all selected, and I was hoping to do the same for those. I will not need the non selected (0). My goal is to have a final long data file that only contains the observations that selected at least one of the options. I have looked at wide to long articles and answers here, but I can't seem to adapt the information to what I need. Thank you.

OBS	med___1	med___2	med___3	med___4	med___5	med___6	med___7	med___8
1	0	0	0	0	0	0	0	1
2	1	0	1	0	0	0	0	0
3	1	0	0	0	0	0	0	0
4	1	1	1	0	0	0	0	0
5	0	0	0	0	0	0	0	0
6	1	1	1	1	1	0	0	0

Mscarboncopy · Posted 01-29-2021 08:11 AM

To update how I ended up solving this, after everyone's help. Thank you all again.

I used

proc transpose
  data=have
  out=long (
    where=(col1 ne 0)
  )
;
by obs;
var med:;
run;

data want;
set long;
med = scan(_name_,-1,"_");
drop _name_ col1;
run;

and replaced col1 ne 0 with col1 ne ' ' for the character variables and use RENAME col 1 in the data step for character variables instead of scan.

Data want2 (rename=(col1=mycharactervarname));

set long;

Run;

I transposed each variable separately and then merged them all. I have not done the date variables yet, but it should be straightforward. I used col2 to check for duplicates in my data file (before dropping it) and found out which ones they were and removed them from my have file (as when this happens, it indicates a data collection mistake in this case). I hope this can help others as well.

View solution in original post

PeterClemmensen · Posted 01-25-2021 02:26 PM

So what does your desired long data look like given this sample?

The DATA to DATA Step Macro
Blog: SASnrd

Mscarboncopy · Posted 01-25-2021 03:24 PM

5 is not there because it only had 0's nothing selected. it would look like this.

OBS	Med
1	8
2	1
2	3
3	1
4	1
4	2
4	3
6	1
6	2
6	3
6	4
6	5

ballardw · Posted 01-25-2021 03:07 PM

Since you mention that you have a lot of other variables it would be a good idea to share how you expect the data to look including some of those others.

It may help to provide a worked example using a smaller set of values, say med_1 to med_4 and similar for at least one other variable if it is also a multiple response value like these Med values.

Mscarboncopy · Posted 01-25-2021 03:25 PM

Yes, the others are also select not select, exactly like the med vars here. Thank you.

ballardw · Posted 01-25-2021 03:35 PM

@Mscarboncopy wrote:

Yes, the others are also select not select, exactly like the med vars here. Thank you.

You need to show what the output involving these other variables would be. It seems pretty obvious that there should be another another variable but how to generate it is not obvious because what if the Obs=1 has 3 values for the other variable? You currently have exactly 1 value for Med but what happens with the other variable involved? This can get to be a very complex issue depending on what you think you want.

Mscarboncopy · Posted 01-25-2021 03:51 PM

Oh, I see what you mean. I apologize.

So yes, it would have an issue right? Especially when I am adding variables that have 50+ options. So I added a var Living with 1-8 values just like Med. 0 for not selected, 1-8 to the selected ones.

OBS	Med	Living
1	8	2
1		3
2	1	1
2	3	2
2		3
3	1	0
4	1	1
4	2	2
4	3	3
4		5
6	1	1
6	2	2
6	3	0
6	4	0
6	5	0

Mscarboncopy · Posted 01-26-2021 07:18 AM

Also I can output the other variables that involve more than 50 selection options as their own file. This would allow me to have a smaller (separate) file with the variables that are like med and living(shown here). But I still need to know how to create the new vars, changing the format and how to handle the missing shown. Thank you.

Kurt_Bremser · Posted 01-26-2021 07:56 AM

Transpose first, and then transform the result:

data have;
input OBS med___1 med___2 med___3 med___4 med___5 med___6 med___7 med___8;
datalines;
1 0 0 0 0 0 0 0 1
2 1 0 1 0 0 0 0 0
3 1 0 0 0 0 0 0 0
4 1 1 1 0 0 0 0 0
5 0 0 0 0 0 0 0 0
6 1 1 1 1 1 0 0 0
;

proc transpose
  data=have
  out=long (
    where=(col1 ne 0)
  )
;
by obs;
var med:;
run;

data want;
set long;
med = scan(_name_,-1,"_");
drop _name_ col1;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Mscarboncopy · Posted 01-26-2021 11:41 AM

This is it! Thank you so much.

Questions:

1. Can I add other variables to this? If I am adding other vars (A B C) would this be correct below.

2. what is the vat col2? My output is showing col2 and they are all missing except one obs has a 1.

proc transpose
  data=have
  out=long (
    where=(col1 ne 0)
  )
;
by obs;
var med: A: B: C:;
run;

data want;
set long;
med = scan(_name_,-1,"_");
A= scan(_name_,-1,"_");
B = scan(_name_,-1,"_");
C= scan(_name_,-1,"_");
drop _name_ col1;
run;

Mscarboncopy · Posted 01-26-2021 11:49 AM

I see this does not work if I want to add more variables. It overwrites. This is the issue brought up before. So my option would be maybe proc transpose each variable and then merge the files ?

Mscarboncopy · Posted 01-26-2021 12:10 PM

When I transpose each variable and merge I then have the issue that the values of the first var will repeat to fill in blanks and the same will keep happening when I add more variables. I am trying to remember how we do this, to stop filling in blanks. First obs? I am not sure. I would use something like that in my merge statement, right? The way I have it now you can see the last ID has the same value repeating because it has 2 values for the second var. This is what others were telling me. And I believe it is okay to leave the others as missing. But I need to find out how to prevent the repetition to fill in blanks.

obs	med	living
1	7
2	4
3	8
4	2	2
5	2
6	3	1
6	3	8

Tom · Posted 01-26-2021 12:39 PM

To avoid the automatic RETAIN (the feature that makes one to many merges work right) you can use CALL MISSING().

data want;
   merge one two three;
   by id;
   output;
   call missing(of _all_);
run;

Mscarboncopy · Posted 01-26-2021 12:52 PM

That is perfect. Thank you Tom. Can I just ignore the var COL2 in my output? I am planning on dropping it, but just wanted to make sure I do not need to use it to check for issues.

Kurt_Bremser · Posted 01-27-2021 03:15 AM

When you get a COL2 variable, this means that the BY variable is not unique, there are two observations for at least one OBS in your original dataset.

For a transpose like this to work consistently, you need to make sure first that your BY variable is unique.

Then, you do a double transpose with some little preparation in between:

data have;
input id $ med_1 med_2 liv_1 liv_2;
datalines;
A 0 1 1 0
B 1 1 0 1
C 1 0 0 1
D 1 1 1 1
;

proc transpose
  data=have
  out=long1 (where=(col1 ne 0))
;
by id;
var med: liv:;
run;

data long2;
set long1;
seq = input(scan(_name_,2,"_"),best.);
_name_ = scan(_name_,1,"_");
run;

proc sort data=long2;
by id seq;
run;

proc transpose
  data=long2
  out=want (drop=seq _name_)
;
by id seq;
var seq;
id _name_;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!