Hello,
I am working with a long dataset consisting of data for school policy violation by students. In the long format, the dataset has more than 6000 cases and 20 variables. The variables of interest for this question are: id, incident_date, incident_number, incident_type, and violation type.
As shown in the hypothetical example below, a) a student can have multiple incident numbers and b) a student can have different types of violations that fall under the same incident type; and c) different students have different number of incidents and violations.
How do I reshape this long dataset into a wide dataset. I am trying to create an index variable but have not succeeded in doing so .
Thank you for your help.
--Angi
I need to reshape this long dataset into a wide dataset in the following manner:
Have:
id | incident_date | incident_number | incident_type | violation_type |
1 | 15-Feb-09 | 94180 | Drugs and Drug Paraphernalia | Violation of law |
1 | 15-Feb-09 | 94180 | Drugs and Drug Paraphernalia | Drugs and drug paraphernalia |
2 | 27-Jul-09 | 91703 | Alcohol | Alcohol |
2 | 27-Jul-09 | 91703 | Alcohol | Disorderly conduct |
2 | 27-Jul-09 | 91703 | Alcohol | Reckless endangerment |
2 | 27-Jul-09 | 91703 | Alcohol | Violation of law |
2 | 14-Sep-09 | 92335 | Alcohol | Violation of law |
2 | 14-Sep-09 | 92335 | Alcohol | Alcohol |
3 | 31-Oct-09 | 93071 | Polysubstance | Disorderly conduct |
3 | 31-Oct-09 | 93071 | Polysubstance | Violation of law |
3 | 16-Nov-09 | 93273 | Theft or damage to property | Theft or damage to property |
3 | 16-Nov-09 | 93273 | Theft or damage to property | Violation of law |
3 | 16-Nov-09 | 93273 | Theft or damage to property | Alcohol |
4 | 11-Jul-09 | 91697 | Drugs and Drug Paraphernalia | Violation of law |
4 | 11-Jul-09 | 91697 | Drugs and Drug Paraphernalia | Drugs and drug paraphernalia |
4 | 5-Dec-09 | 93443 | Alcohol - Medical | Violation of law |
4 | 5-Dec-09 | 93443 | Alcohol - Medical | Alcohol |
4 | 13-Mar-10 | 94360 | Trespassing | Trespassing |
Want:
id | incident_ date1_1 | incident_ number1_1 | incident_ type1_1 | incident 1_1 | incident_ date 1_2 | incident_ number 1_2 | incident_ type1_2 | Incident1_2 | Incident_date1_3 |
1
| 15-Feb-09 | 94180 | Drugs and Drug Paraphernalia | Violation of law | 15-Feb-09 | 94180 | Drugs and Drug Paraphernalia | Drugs and drug paraphernalia | |
2 | 27-Jul-09 | 91703 | Alcohol | Alcohol | 27-Jul-09 | 91703 | Alcohol | Disorderly conduct | 27-Jul-09 |
3 | 31-Oct-09 | 93071 | Polysubstance | Disorderly conduct | 31-Oct-09 | 93071 | Polysubstance | Violation of law |
where,
incident1_1 would represent the 1st type of violation for 1st incident
incident1_2 would represent the 2nd type of violation for 1st incident
incident1_3 would represent the 3rd type of violation for 1st incident
incident2_1 would represent the 1st type of violation for 2nd incident
and so on..
Message was edited by: Angi Stha
Sorry. My bad. Try this one :
data have; input id incident_date : date11. incident_number incident_type & $40. violation_type & $40. ; format incident_date date11.; cards; 1 15-Feb-09 94180 Drugs and Drug Paraphernalia Violation of law 1 15-Feb-09 94180 Drugs and Drug Paraphernalia Drugs and drug paraphernalia 2 27-Jul-09 91703 Alcohol Alcohol 2 27-Jul-09 91703 Alcohol Disorderly conduct 2 27-Jul-09 91703 Alcohol Reckless endangerment 2 27-Jul-09 91703 Alcohol Violation of law 2 14-Sep-09 92335 Alcohol Violation of law 2 14-Sep-09 92335 Alcohol Alcohol ; run; data have; set have; by id incident_date incident_number; if first.id then _id=0; _id+first.incident_number; if first.incident_number then n=0; n+1; run; proc sql ; create table temp as select distinct _id,n from have; quit; data _null_; set temp end=last; if _n_ eq 1 then call execute('data want;merge '); call execute(catt('have(where=(n=',n,' and _id=',_id,') rename=(incident_date=incident_date',_id,'_',n,' incident_number=incident_number',_id,'_',n,' incident_type=incident_type',_id,'_',n,' violation_type=incident',_id,'_',n,'))')); if last then call execute(';by id;drop _id n;run;'); run;
Xia Keshan
Hi,
You can very simply add an id just by sorting the dataset, then in a datastep retain your_id variable, and us by:
data want;
set have;
retain your_id;
by id;
if first.id then your_id=1;
else your_id=your_id+1;
run;
This can then be proc transposed to get a wide dataset.
My question would be, why? Is there are reason you need a dataset with 6000 columns, how are you planning on working with this data. If you really have to have that kind of configuration, you will want to work with arrays, so you would probably be better off with a 2 dimensional array: array something{x,y} .... I.e. x would be your incidence count, y would be the violation count within that incidence:
Hello RW9, Thank you for your reply. I had tried the method you suggested but adding a regular id will not work. I have tried to clarify my question by including a want statement. The wide dataset will not have 6000 columns as the index variable is "violation_type". Any suggestions on how I can tackle this?
Thank you,
Angi
You only post the sample data , not the output you need yet .
Assuming I understood what you mean.
data have; input id incident_date : date11. incident_number incident_type & $40. violation_type & $40. ; format incident_date date11.; cards; 1 15-Feb-09 94180 Drugs and Drug Paraphernalia Violation of law 1 15-Feb-09 94180 Drugs and Drug Paraphernalia Drugs and drug paraphernalia 2 27-Jul-09 91703 Alcohol Alcohol 2 27-Jul-09 91703 Alcohol Disorderly conduct 2 27-Jul-09 91703 Alcohol Reckless endangerment 2 27-Jul-09 91703 Alcohol Violation of law 2 14-Sep-09 92335 Alcohol Violation of law 2 14-Sep-09 92335 Alcohol Alcohol ; run; data have; set have; by id incident_date incident_number; if first.id then a=0; a+first.incident_number; if first.incident_number then b=0; b+1; run; proc sql noprint; select distinct catt('have(where=(a=',a,' and b=',b,') rename=(violation_type=incident',a,'_',b,'))') into : list separated by ' ' from have; quit; data want; merge &list ; by id incident_date incident_number; drop a b incident_type; run;
Xia Keshan
Hi Xia Keshan,
Thank you for your reply. I have updated my question with a “want” dataset to clarify my question.
I copied the syntax you provided and ran it. I got the following error message:
23 proc sql noprint;
24 select distinct catt('have(where=(a=',a,' and b=',b,')
24 ! rename=(violation_type=incident',a,'_',b,'))')
25 into : list separated by ' '
26 from temp;
ERROR: File WORK.TEMP.DATA does not exist.
27 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
28 data want;
29 merge &list ;
-
22
200
WARNING: Apparent symbolic reference LIST not resolved.
ERROR: File WORK.LIST.DATA does not exist.
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, ;, END,
_DATA_, _LAST_, _NULL_.
ERROR 200-322: The symbol is not recognized and will be ignored.
30 by id incident_date incident_number;
31 drop a b incident_type;
32 run;
WARNING: The variable a in the DROP, KEEP, or RENAME list has never been referenced.
WARNING: The variable b in the DROP, KEEP, or RENAME list has never been referenced.
WARNING: The variable incident_type in the DROP, KEEP, or RENAME list has never been referenced.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0
observations and 0 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.00 seconds
I’d appreciate any suggestions you might have.
Thank you,
Angi
Sorry. My bad. Try this one :
data have; input id incident_date : date11. incident_number incident_type & $40. violation_type & $40. ; format incident_date date11.; cards; 1 15-Feb-09 94180 Drugs and Drug Paraphernalia Violation of law 1 15-Feb-09 94180 Drugs and Drug Paraphernalia Drugs and drug paraphernalia 2 27-Jul-09 91703 Alcohol Alcohol 2 27-Jul-09 91703 Alcohol Disorderly conduct 2 27-Jul-09 91703 Alcohol Reckless endangerment 2 27-Jul-09 91703 Alcohol Violation of law 2 14-Sep-09 92335 Alcohol Violation of law 2 14-Sep-09 92335 Alcohol Alcohol ; run; data have; set have; by id incident_date incident_number; if first.id then _id=0; _id+first.incident_number; if first.incident_number then n=0; n+1; run; proc sql ; create table temp as select distinct _id,n from have; quit; data _null_; set temp end=last; if _n_ eq 1 then call execute('data want;merge '); call execute(catt('have(where=(n=',n,' and _id=',_id,') rename=(incident_date=incident_date',_id,'_',n,' incident_number=incident_number',_id,'_',n,' incident_type=incident_type',_id,'_',n,' violation_type=incident',_id,'_',n,'))')); if last then call execute(';by id;drop _id n;run;'); run;
Xia Keshan
This is amazing! Thank you very much
Hi Angi,
Please find below an alternative way of getting the same output, using arrays
data want;
set have;
by id;
if first.id then row=1;
else row+1;
run;
proc sql;
select max(row) into :row from want;
run;
%put &row;
data array;
set want;
format incident_date1-incident_date6 date9.;
retain incident_date1-incident_date6 incident_number1-incident_number6 incident_type1-incident_type6 violation_type1-violation_type6;
array sc(&row) incident_date1-incident_date6;
array sc1(&row) incident_number1-incident_number6;
array sc2(&row) $100 incident_type1-incident_type6;
array sc3(&row) $100 violation_type1-violation_type6;
by id;
if first.id then do;
do i = 1 to &row;
sc(i)=.;
sc1(i)=.;
sc2(i)='';
sc3(i)='';
end;
end;
sc(row)=incident_date;
sc1(row)=incident_number;
sc2(row)=incident_type;
sc3(row)=violation_type;
if last.id;
drop row i;
run;
Hope it helps.
Thanks,
Jag
This is very helpful Jag. Thank you
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.