Solved: Re: Reshape long data into wide data

Angi · Posted 01-26-2015 05:52 AM

Hello,

I am working with a long dataset consisting of data for school policy violation by students. In the long format, the dataset has more than 6000 cases and 20 variables. The variables of interest for this question are: id, incident_date, incident_number, incident_type, and violation type.

As shown in the hypothetical example below, a) a student can have multiple incident numbers and b) a student can have different types of violations that fall under the same incident type; and c) different students have different number of incidents and violations.

How do I reshape this long dataset into a wide dataset. I am trying to create an index variable but have not succeeded in doing so .

Thank you for your help.

--Angi

I need to reshape this long dataset into a wide dataset in the following manner:

Have:

id	incident_date	incident_number	incident_type	violation_type
1	15-Feb-09	94180	Drugs and Drug Paraphernalia	Violation of law
1	15-Feb-09	94180	Drugs and Drug Paraphernalia	Drugs and drug paraphernalia
2	27-Jul-09	91703	Alcohol	Alcohol
2	27-Jul-09	91703	Alcohol	Disorderly conduct
2	27-Jul-09	91703	Alcohol	Reckless endangerment
2	27-Jul-09	91703	Alcohol	Violation of law
2	14-Sep-09	92335	Alcohol	Violation of law
2	14-Sep-09	92335	Alcohol	Alcohol
3	31-Oct-09	93071	Polysubstance	Disorderly conduct
3	31-Oct-09	93071	Polysubstance	Violation of law
3	16-Nov-09	93273	Theft or damage to property	Theft or damage to property
3	16-Nov-09	93273	Theft or damage to property	Violation of law
3	16-Nov-09	93273	Theft or damage to property	Alcohol
4	11-Jul-09	91697	Drugs and Drug Paraphernalia	Violation of law
4	11-Jul-09	91697	Drugs and Drug Paraphernalia	Drugs and drug paraphernalia
4	5-Dec-09	93443	Alcohol - Medical	Violation of law
4	5-Dec-09	93443	Alcohol - Medical	Alcohol
4	13-Mar-10	94360	Trespassing	Trespassing

Want:

id	incident_ date1_1	incident_ number1_1	incident_ type1_1	incident 1_1	incident_ date 1_2	incident_ number 1_2	incident_ type1_2	Incident1_2	Incident_date1_3
1	15-Feb-09	94180	Drugs and Drug Paraphernalia	Violation of law	15-Feb-09	94180	Drugs and Drug Paraphernalia	Drugs and drug paraphernalia
2	27-Jul-09	91703	Alcohol	Alcohol	27-Jul-09	91703	Alcohol	Disorderly conduct	27-Jul-09
3	31-Oct-09	93071	Polysubstance	Disorderly conduct	31-Oct-09	93071	Polysubstance	Violation of law

where,

incident1_1 would represent the 1st type of violation for 1st incident

incident1_2 would represent the 2nd type of violation for 1st incident

incident1_3 would represent the 3rd type of violation for 1st incident

incident2_1 would represent the 1st type of violation for 2nd incident

and so on..

Message was edited by: Angi Stha

Ksharp · Posted 01-27-2015 04:56 AM

Sorry. My bad. Try this one :

data have;
input id incident_date : date11. incident_number incident_type & $40. violation_type & $40.     ;
format incident_date  date11.;
cards;
1 15-Feb-09 94180  Drugs and Drug Paraphernalia   Violation of law
1 15-Feb-09 94180  Drugs and Drug Paraphernalia   Drugs and drug paraphernalia
2 27-Jul-09 91703  Alcohol   Alcohol
2 27-Jul-09 91703  Alcohol   Disorderly conduct
2 27-Jul-09 91703  Alcohol   Reckless endangerment
2 27-Jul-09 91703  Alcohol   Violation of law
2 14-Sep-09 92335  Alcohol   Violation of law
2 14-Sep-09 92335  Alcohol   Alcohol
;
run;
data have;
 set have;
 by   id incident_date incident_number;
 if first.id then _id=0;
 _id+first.incident_number;
 if first.incident_number then n=0;
 n+1;
run;
proc sql ;
create table temp as
 select distinct _id,n from have;
quit;
data _null_;
 set temp end=last;
 if _n_ eq 1 then call execute('data want;merge ');
 call execute(catt('have(where=(n=',n,' and _id=',_id,') rename=(incident_date=incident_date',_id,'_',n,' incident_number=incident_number',_id,'_',n,' incident_type=incident_type',_id,'_',n,' violation_type=incident',_id,'_',n,'))'));
 if last then call execute(';by id;drop _id n;run;');
run;

Xia Keshan

View solution in original post

RW9 · Posted 01-26-2015 06:15 AM

Hi,

You can very simply add an id just by sorting the dataset, then in a datastep retain your_id variable, and us by:

data want;

set have;

retain your_id;

by id;

if first.id then your_id=1;

else your_id=your_id+1;

run;

This can then be proc transposed to get a wide dataset.

My question would be, why? Is there are reason you need a dataset with 6000 columns, how are you planning on working with this data. If you really have to have that kind of configuration, you will want to work with arrays, so you would probably be better off with a 2 dimensional array: array something{x,y} .... I.e. x would be your incidence count, y would be the violation count within that incidence:

Angi · Posted 01-26-2015 04:28 PM

Hello RW9, Thank you for your reply. I had tried the method you suggested but adding a regular id will not work. I have tried to clarify my question by including a want statement. The wide dataset will not have 6000 columns as the index variable is "violation_type". Any suggestions on how I can tackle this?

Thank you,

Angi

Ksharp · Posted 01-26-2015 08:02 AM

You only post the sample data , not the output you need yet .

Assuming I understood what you mean.

data have;
input id incident_date : date11. incident_number incident_type & $40. violation_type & $40.     ;
format incident_date  date11.;
cards;
1 15-Feb-09 94180  Drugs and Drug Paraphernalia   Violation of law
1 15-Feb-09 94180  Drugs and Drug Paraphernalia   Drugs and drug paraphernalia
2 27-Jul-09 91703  Alcohol   Alcohol
2 27-Jul-09 91703  Alcohol   Disorderly conduct
2 27-Jul-09 91703  Alcohol   Reckless endangerment
2 27-Jul-09 91703  Alcohol   Violation of law
2 14-Sep-09 92335  Alcohol   Violation of law
2 14-Sep-09 92335  Alcohol   Alcohol
;
run;
data have;
 set have;
 by   id incident_date  incident_number;
 if first.id then a=0;
 a+first.incident_number;
 if first.incident_number then b=0;
 b+1;
run;
proc sql noprint;
 select distinct catt('have(where=(a=',a,' and b=',b,') rename=(violation_type=incident',a,'_',b,'))') 
   into : list separated by ' '
    from have;
quit;
data want;
 merge &list ;
 by id incident_date  incident_number;
 drop a b incident_type;
run;

Xia Keshan

Angi · Posted 01-26-2015 04:39 PM

Hi Xia Keshan,

Thank you for your reply. I have updated my question with a “want” dataset to clarify my question.

I copied the syntax you provided and ran it. I got the following error message:

23 proc sql noprint;

24 select distinct catt('have(where=(a=',a,' and b=',b,')

24 ! rename=(violation_type=incident',a,'_',b,'))')

25 into : list separated by ' '

26 from temp;

ERROR: File WORK.TEMP.DATA does not exist.

27 quit;

NOTE: The SAS System stopped processing this step because of errors.

NOTE: PROCEDURE SQL used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

28 data want;

29 merge &list ;

-

22

200

WARNING: Apparent symbolic reference LIST not resolved.

ERROR: File WORK.LIST.DATA does not exist.

ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, ;, END,

_DATA_, _LAST_, _NULL_.

ERROR 200-322: The symbol is not recognized and will be ignored.

30 by id incident_date incident_number;

31 drop a b incident_type;

32 run;

WARNING: The variable a in the DROP, KEEP, or RENAME list has never been referenced.

WARNING: The variable b in the DROP, KEEP, or RENAME list has never been referenced.

WARNING: The variable incident_type in the DROP, KEEP, or RENAME list has never been referenced.

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0

observations and 0 variables.

NOTE: DATA statement used (Total process time):

real time 0.03 seconds

cpu time 0.00 seconds

I’d appreciate any suggestions you might have.

Thank you,

Angi

Ksharp · Posted 01-27-2015 04:56 AM

Sorry. My bad. Try this one :

data have;
input id incident_date : date11. incident_number incident_type & $40. violation_type & $40.     ;
format incident_date  date11.;
cards;
1 15-Feb-09 94180  Drugs and Drug Paraphernalia   Violation of law
1 15-Feb-09 94180  Drugs and Drug Paraphernalia   Drugs and drug paraphernalia
2 27-Jul-09 91703  Alcohol   Alcohol
2 27-Jul-09 91703  Alcohol   Disorderly conduct
2 27-Jul-09 91703  Alcohol   Reckless endangerment
2 27-Jul-09 91703  Alcohol   Violation of law
2 14-Sep-09 92335  Alcohol   Violation of law
2 14-Sep-09 92335  Alcohol   Alcohol
;
run;
data have;
 set have;
 by   id incident_date incident_number;
 if first.id then _id=0;
 _id+first.incident_number;
 if first.incident_number then n=0;
 n+1;
run;
proc sql ;
create table temp as
 select distinct _id,n from have;
quit;
data _null_;
 set temp end=last;
 if _n_ eq 1 then call execute('data want;merge ');
 call execute(catt('have(where=(n=',n,' and _id=',_id,') rename=(incident_date=incident_date',_id,'_',n,' incident_number=incident_number',_id,'_',n,' incident_type=incident_type',_id,'_',n,' violation_type=incident',_id,'_',n,'))'));
 if last then call execute(';by id;drop _id n;run;');
run;

Xia Keshan

Angi · Posted 01-27-2015 05:09 PM

This is amazing! Thank you very much

Jagadishkatam · Posted 01-27-2015 08:14 PM

Hi Angi,

Please find below an alternative way of getting the same output, using arrays

data want;

set have;

by id;

if first.id then row=1;

else row+1;

run;

proc sql;

select max(row) into :row from want;

run;

%put &row;

data array;

set want;

format incident_date1-incident_date6 date9.;

retain incident_date1-incident_date6 incident_number1-incident_number6 incident_type1-incident_type6 violation_type1-violation_type6;

array sc(&row) incident_date1-incident_date6;

array sc1(&row) incident_number1-incident_number6;

array sc2(&row) $100 incident_type1-incident_type6;

array sc3(&row) $100 violation_type1-violation_type6;

by id;

if first.id then do;

do i = 1 to &row;

sc(i)=.;

sc1(i)=.;

sc2(i)='';

sc3(i)='';

end;

sc(row)=incident_date;

sc1(row)=incident_number;

sc2(row)=incident_type;

sc3(row)=violation_type;

if last.id;

drop row i;

run;

Hope it helps.

Thanks,

Jag

Thanks,
Jag

Angi · Posted 01-28-2015 01:47 AM

This is very helpful Jag. Thank you

SAS Innovate 2025: Call for Content

Classroom Training Available!