Hello everyone,
I’m new to SAS and have been trying to figure out how to solve the following problem:
Let’s assume I have the following dataset:
data specialty;
input ID specialty $12. ;
cards;
1 Neurologist
1 Psychologist
2 GP
3 Neurologist
4 Cardiologist
5 Cardiologist
;
run;
What I want to do now is to create a new column for every physician specialty from the column “specialty” and put a “1” in the respective column if it fits and zero otherwise.
So the result should look like this:
ID | specialty | Neurologist | Psychologist | GP | Cardiologist |
1 | Neurologist | 1 | 0 | 0 | 0 |
1 | Psychologist | 0 | 1 | 0 | 0 |
2 | GP | 0 | 0 | 1 | 0 |
3 | Neurologist | 1 | 0 | 0 | 0 |
4 | Cardiologist | 0 | 0 | 0 | 1 |
5 | Cardiologist | 0 | 0 | 0 | 1 |
I solved the problem with creating every variable manually and lots of “if then statements” for now but I’m not very happy with this. Is there a more elaborated way to achieve the desired outcome?
Thank you!
data specialty;
input ID specialty $12. ;
x = 1;
cards;
1 Neurologist
1 Psychologist
2 GP
3 Neurologist
4 Cardiologist
5 Cardiologist
;
run;
proc transpose data=specialty out=wide (drop=_NAME_);
by ID;
id specialty;
var x;
run;
This basically provides what you wanted (having missing values instead of zero), but you only get 1 observation for ID=1, where both columns Neuroloigist and Psychologist have the value 1.
data specialty;
input ID specialty $12. ;
x = 1;
cards;
1 Neurologist
1 Psychologist
2 GP
3 Neurologist
4 Cardiologist
5 Cardiologist
;
run;
proc transpose data=specialty out=wide (drop=_NAME_);
by ID;
id specialty;
var x;
run;
This basically provides what you wanted (having missing values instead of zero), but you only get 1 observation for ID=1, where both columns Neuroloigist and Psychologist have the value 1.
Thank you Kurt and RW9 for the quick responses. That is exactly what I needed!
Looks like a report to me. What is the intended use for the output data set?
I believe you can use cross tabulations in PROC FREQ for this.
Well, this will give you what you want:
data specialty; input ID specialty $12. ; res=1; cards; 1 Neurologist 1 Psychologist 2 GP 3 Neurologist 4 Cardiologist 5 Cardiologist ; run; proc transpose data=specialty out=want; by id specialty; var res; id specialty; idlabel specialty; run;
However I would advise against this approach, if you variables change then your going to need to reprogram. Its generally a good idea to have a general prefix and an incrementor so you can use arrays, you can still label them what ever your like, but its so much easier to code with:
data specialty; input ID specialty $12. ; res=1; cards; 1 Neurologist 1 Psychologist 2 GP 3 Neurologist 4 Cardiologist 5 Cardiologist ; run; data want; set specialty; array var{4} 8.; select(specialty); when ("Neurologist") var{1}=1; when ("Psychologist") var{2}=1; when ("GP") var{3}=1; when ("Cardiologist") var{4}=1; otherwise; end; run;
You can then refer to the array of vars in further processing.
data specialty;
input ID specialty $12. ;
x = 1;
n+1;
cards;
1 Neurologist
1 Psychologist
2 GP
3 Neurologist
4 Cardiologist
5 Cardiologist
;
run;
proc transpose data=specialty out=wide (drop=_NAME_ n);
by n;
id specialty;
var x;
copy id specialty;
run;
proc stdize data=wide out=want missing=0 reponly;run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.