Hello everyone,
I’m new to SAS and have been trying to figure out how to solve the following problem:
Let’s assume I have the following dataset:
data specialty;
input ID specialty $12. ;
cards;
1 Neurologist
1 Psychologist
2 GP
3 Neurologist
4 Cardiologist
5 Cardiologist
;
run;
What I want to do now is to create a new column for every physician specialty from the column “specialty” and put a “1” in the respective column if it fits and zero otherwise.
So the result should look like this:
ID | specialty | Neurologist | Psychologist | GP | Cardiologist |
1 | Neurologist | 1 | 0 | 0 | 0 |
1 | Psychologist | 0 | 1 | 0 | 0 |
2 | GP | 0 | 0 | 1 | 0 |
3 | Neurologist | 1 | 0 | 0 | 0 |
4 | Cardiologist | 0 | 0 | 0 | 1 |
5 | Cardiologist | 0 | 0 | 0 | 1 |
I solved the problem with creating every variable manually and lots of “if then statements” for now but I’m not very happy with this. Is there a more elaborated way to achieve the desired outcome?
Thank you!
data specialty;
input ID specialty $12. ;
x = 1;
cards;
1 Neurologist
1 Psychologist
2 GP
3 Neurologist
4 Cardiologist
5 Cardiologist
;
run;
proc transpose data=specialty out=wide (drop=_NAME_);
by ID;
id specialty;
var x;
run;
This basically provides what you wanted (having missing values instead of zero), but you only get 1 observation for ID=1, where both columns Neuroloigist and Psychologist have the value 1.
data specialty;
input ID specialty $12. ;
x = 1;
cards;
1 Neurologist
1 Psychologist
2 GP
3 Neurologist
4 Cardiologist
5 Cardiologist
;
run;
proc transpose data=specialty out=wide (drop=_NAME_);
by ID;
id specialty;
var x;
run;
This basically provides what you wanted (having missing values instead of zero), but you only get 1 observation for ID=1, where both columns Neuroloigist and Psychologist have the value 1.
Thank you Kurt and RW9 for the quick responses. That is exactly what I needed!
Looks like a report to me. What is the intended use for the output data set?
I believe you can use cross tabulations in PROC FREQ for this.
Well, this will give you what you want:
data specialty; input ID specialty $12. ; res=1; cards; 1 Neurologist 1 Psychologist 2 GP 3 Neurologist 4 Cardiologist 5 Cardiologist ; run; proc transpose data=specialty out=want; by id specialty; var res; id specialty; idlabel specialty; run;
However I would advise against this approach, if you variables change then your going to need to reprogram. Its generally a good idea to have a general prefix and an incrementor so you can use arrays, you can still label them what ever your like, but its so much easier to code with:
data specialty; input ID specialty $12. ; res=1; cards; 1 Neurologist 1 Psychologist 2 GP 3 Neurologist 4 Cardiologist 5 Cardiologist ; run; data want; set specialty; array var{4} 8.; select(specialty); when ("Neurologist") var{1}=1; when ("Psychologist") var{2}=1; when ("GP") var{3}=1; when ("Cardiologist") var{4}=1; otherwise; end; run;
You can then refer to the array of vars in further processing.
data specialty;
input ID specialty $12. ;
x = 1;
n+1;
cards;
1 Neurologist
1 Psychologist
2 GP
3 Neurologist
4 Cardiologist
5 Cardiologist
;
run;
proc transpose data=specialty out=wide (drop=_NAME_ n);
by n;
id specialty;
var x;
copy id specialty;
run;
proc stdize data=wide out=want missing=0 reponly;run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.