11-25-2017 01:34 PM
I have a dataset with 12,000 some variables and I need to collapse them so that there are no repeats with ID number and the symptoms are described on one line. The data I am working with looks like this:
And I am trying to get it to look like this:
The data is from excel so I imported it and have come up with the following code:
proc sort data= project3; by id_no; run;
*need array to identify symptoms
*symptom 1=heartburn, symptom 2=sickness, symptom 3=spasm, symptom 4=temperature, symptom 5=tiredness
*dont want to keep symptom_no and symptom, instead make new variable;
data want; array symptoms symptom_no1-symptom_no5; *I named the arrays symptom instead of sympt to create the new variables; retain symptom_no1-symptom_no5; set project3; by id_no; if first.id_no then do i=1 to 5; *this allowed me to not have any duplicates for the ID_no; symptoms[i]=.; end; if last.id_no then output; keep id_no symptom_no1-symptom_no5; run; proc print data=want; var id_no symptom_no1-symptom_no5; run;
Unfortunately, when I run this, nothing is populated for symptoms and I end up getting this:
I understand that I have to define the symptoms such that when the system reads it knows symptom 1 is heartburns, 2 is sickness, 3=spasm, 4=temperature, 5=tiredness. I'm guessing this should go prior to the symptom array I've already written, but am having a hard time. Could I get some direction/advice please?
11-25-2017 06:49 PM
What are the other 11,997 variables?
Your example input only shows 3 variables. And two of those are showing the same information in different ways. When SYMPTOM_NO=1 then SYMPTOM is always "Heartburns".