BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mscarboncopy
Pyrite | Level 9

Hi I am trying to find out what is the best way to create a variable that I need and change the file format at the same time. My file looks like this (0 is not selected, 1 is selected). I need to create a variable called med that would have values 0 to 8 (depending on what was answered for med_1 to med_8). My issue is that if I use an array it will overwrite, as any ID (obs) can have multiple combinations of selected options (they can select all options or none and any combination in between). I could add all the possible combinations, however, my data file has other variables with 1-50 answer options  that can be all selected, and I was hoping to do the same for those. I will not need the non selected (0). My goal is to have a final long data file that only contains the observations that selected at least one of the options. I have looked at wide to long articles and answers here, but I can't seem to adapt the information to what I need. Thank you. 

OBSmed___1med___2med___3med___4med___5med___6med___7med___8
100000001
210100000
310000000
411100000
500000000
611111000
1 ACCEPTED SOLUTION

Accepted Solutions
Mscarboncopy
Pyrite | Level 9

To update how I ended up solving this, after everyone's help. Thank you all again. 

I used

proc transpose
  data=have
  out=long (
    where=(col1 ne 0)
  )
;
by obs;
var med:;
run;

data want;
set long;
med = scan(_name_,-1,"_");
drop _name_ col1;
run;

 and replaced  col1 ne 0 with col1 ne  ' ' for the character variables and use RENAME col 1 in the data step for character variables instead of scan. 

Data want2 (rename=(col1=mycharactervarname));

set long;

Run;

I transposed each variable separately and then merged them all. I have not done the date variables yet, but it should be straightforward. I used col2 to check for duplicates in my data file (before dropping it) and found out which ones they were and removed them from my have  file (as when this happens, it indicates a data collection mistake in this case). I hope this can help others as well.

View solution in original post

33 REPLIES 33
PeterClemmensen
Tourmaline | Level 20

So what does your desired long data look like given this sample?

Mscarboncopy
Pyrite | Level 9

5 is not  there because it only had 0's nothing selected. it would look like this.

OBSMed
18
21
23
31
41
42
43
61
62
63
64
65
ballardw
Super User

Since you mention that you have a lot of other variables it would be a good idea to share how you expect the data to look including some of those others.

It may help to provide a worked example using a smaller set of values, say med_1 to med_4 and similar for at least one other variable if it is also a multiple response value like these Med values.

 

Mscarboncopy
Pyrite | Level 9

Yes, the others are also select not select, exactly like the med vars here. Thank you.

ballardw
Super User

@Mscarboncopy wrote:

Yes, the others are also select not select, exactly like the med vars here. Thank you.


You need to show what the output involving these other variables would be. It seems pretty obvious that there should be another another variable but how to generate it is not obvious because what if the Obs=1 has 3 values for the other variable? You currently have exactly 1 value for Med but what happens with the other variable involved? This can get to be a very complex issue depending on what you think you want.

Mscarboncopy
Pyrite | Level 9

Oh, I see what you mean. I apologize.

So yes, it would have an issue right? Especially when I am adding variables that have 50+ options. So I added a var Living with 1-8 values just like Med. 0 for not selected, 1-8 to the selected ones.

OBSMedLiving
182
1 3
211
232
2 3
310
411
422
433
4 5
611
622
630
640
650
Mscarboncopy
Pyrite | Level 9

Also I can output the other variables that involve more than 50 selection options as their own file. This would allow me to have a smaller (separate) file with the variables that are like med and living(shown here). But I still need to know how to create the new vars, changing the format and how to handle the missing shown. Thank you.

Kurt_Bremser
Super User

Transpose first, and then transform the result:

data have;
input OBS med___1 med___2 med___3 med___4 med___5 med___6 med___7 med___8;
datalines;
1 0 0 0 0 0 0 0 1
2 1 0 1 0 0 0 0 0
3 1 0 0 0 0 0 0 0
4 1 1 1 0 0 0 0 0
5 0 0 0 0 0 0 0 0
6 1 1 1 1 1 0 0 0
;

proc transpose
  data=have
  out=long (
    where=(col1 ne 0)
  )
;
by obs;
var med:;
run;

data want;
set long;
med = scan(_name_,-1,"_");
drop _name_ col1;
run;
Mscarboncopy
Pyrite | Level 9

This is it! Thank you so much.

Questions: 

1. Can I add other variables to this?  If I am adding other vars (A B C) would this be correct below.

2. what is the vat col2? My output is showing col2 and they are all missing except one obs has a 1.

proc transpose
  data=have
  out=long (
    where=(col1 ne 0)
  )
;
by obs;
var med: A: B: C:;
run;

data want;
set long;
med = scan(_name_,-1,"_");
A= scan(_name_,-1,"_");
B = scan(_name_,-1,"_");
C= scan(_name_,-1,"_");
drop _name_ col1;
run;

 

Mscarboncopy
Pyrite | Level 9

I see this does not work if I want to add more variables. It overwrites. This is the issue brought up before. So my option would be maybe proc transpose each variable and then merge the files ?

Mscarboncopy
Pyrite | Level 9

When I transpose each variable and merge I then have the issue that the values of the first var will repeat to fill in blanks and the same will keep happening when I add more variables. I am trying to remember how we do this, to stop filling in blanks. First obs? I am not sure. I would use something like that in my merge statement, right?  The way I have it now you can see the last ID has the same value repeating because it has 2 values for the second var. This is what others were telling me. And I believe it is okay to leave the others as missing. But I need to find out how to prevent the repetition to fill in blanks.

obsmedliving
17 
24 
38 
422
52 
631
638
Tom
Super User Tom
Super User

To avoid the automatic RETAIN (the feature that makes one to many merges work right) you can use CALL MISSING().

data want;
   merge one two three;
   by id;
   output;
   call missing(of _all_);
run;
Mscarboncopy
Pyrite | Level 9

That is perfect. Thank you Tom. Can I just ignore the var COL2 in my output? I am planning on dropping it, but just wanted to make sure I do not need to use it to check for issues.

Kurt_Bremser
Super User

When you get a COL2 variable, this means that the BY variable is not unique, there are two observations for at least one OBS in your original dataset.

For a transpose like this to work consistently, you need to make sure first that your BY variable is unique.

Then, you do a double transpose with some little preparation in between:

data have;
input id $ med_1 med_2 liv_1 liv_2;
datalines;
A 0 1 1 0
B 1 1 0 1
C 1 0 0 1
D 1 1 1 1
;

proc transpose
  data=have
  out=long1 (where=(col1 ne 0))
;
by id;
var med: liv:;
run;

data long2;
set long1;
seq = input(scan(_name_,2,"_"),best.);
_name_ = scan(_name_,1,"_");
run;

proc sort data=long2;
by id seq;
run;

proc transpose
  data=long2
  out=want (drop=seq _name_)
;
by id seq;
var seq;
id _name_;
run;

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 33 replies
  • 1353 views
  • 3 likes
  • 5 in conversation