BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Angi
Obsidian | Level 7

Hello,

I am working with a long dataset consisting of data for school policy violation by students. In the long format, the dataset has more than 6000 cases and 20 variables. The variables of interest for this question are: id, incident_date, incident_number, incident_type, and violation type.

As shown in the hypothetical example below, a) a student can have multiple incident numbers and b) a student can have different types of violations that fall under the same incident type; and c) different students have different number of incidents and violations.

How do I reshape this long dataset into a wide dataset. I am trying to create an index variable but have not succeeded in doing so .


Thank you for your help.

--Angi

I need to reshape this long dataset into a wide dataset in the following manner:


Have:

id

incident_date

incident_number

incident_type

violation_type

1

15-Feb-09

94180

Drugs and Drug Paraphernalia

Violation of law

1

15-Feb-09

94180

Drugs and Drug Paraphernalia

Drugs and drug paraphernalia

2

27-Jul-09

91703

Alcohol

Alcohol

2

27-Jul-09

91703

Alcohol

Disorderly conduct

2

27-Jul-09

91703

Alcohol

Reckless endangerment

2

27-Jul-09

91703

Alcohol

Violation of law

2

14-Sep-09

92335

Alcohol

Violation of law

2

14-Sep-09

92335

Alcohol

Alcohol

3

31-Oct-09

93071

Polysubstance

Disorderly conduct

3

31-Oct-09

93071

Polysubstance

Violation of law

3

16-Nov-09

93273

Theft or damage to property

Theft or damage to property

3

16-Nov-09

93273

Theft or damage to property

Violation of law

3

16-Nov-09

93273

Theft or damage to property

Alcohol

4

11-Jul-09

91697

Drugs and Drug Paraphernalia

Violation of law

4

11-Jul-09

91697

Drugs and Drug Paraphernalia

Drugs and drug paraphernalia

4

5-Dec-09

93443

Alcohol - Medical

Violation of law

4

5-Dec-09

93443

Alcohol - Medical

Alcohol

4

13-Mar-10

94360

Trespassing

Trespassing

Want:

id

incident_

date1_1

incident_

number1_1

incident_

type1_1

incident

1_1

incident_

date

1_2

incident_

number

1_2

incident_

type1_2

Incident1_2

Incident_date1_3

1

15-Feb-09

94180

Drugs and Drug Paraphernalia

Violation of law

15-Feb-09

94180

Drugs and Drug Paraphernalia

Drugs and drug paraphernalia

2

27-Jul-09

91703

Alcohol

Alcohol

27-Jul-09

91703

Alcohol

Disorderly conduct

27-Jul-09

3

31-Oct-09

93071

Polysubstance

Disorderly conduct

31-Oct-09

93071

Polysubstance

Violation of law

where,

incident1_1 would represent the 1st type of violation for 1st incident

incident1_2 would represent the 2nd type of violation for 1st incident

incident1_3 would represent the 3rd type of violation for 1st incident

incident2_1 would represent the 1st type of violation for 2nd incident

and so on..

Message was edited by: Angi Stha

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Sorry. My bad. Try this one :

data have;
input id incident_date : date11. incident_number incident_type & $40. violation_type & $40.     ;
format incident_date  date11.;
cards;
1 15-Feb-09 94180  Drugs and Drug Paraphernalia   Violation of law
1 15-Feb-09 94180  Drugs and Drug Paraphernalia   Drugs and drug paraphernalia
2 27-Jul-09 91703  Alcohol   Alcohol
2 27-Jul-09 91703  Alcohol   Disorderly conduct
2 27-Jul-09 91703  Alcohol   Reckless endangerment
2 27-Jul-09 91703  Alcohol   Violation of law
2 14-Sep-09 92335  Alcohol   Violation of law
2 14-Sep-09 92335  Alcohol   Alcohol
;
run;
data have;
 set have;
 by   id incident_date incident_number;
 if first.id then _id=0;
 _id+first.incident_number;
 if first.incident_number then n=0;
 n+1;
run;
proc sql ;
create table temp as
 select distinct _id,n from have;
quit;
data _null_;
 set temp end=last;
 if _n_ eq 1 then call execute('data want;merge ');
 call execute(catt('have(where=(n=',n,' and _id=',_id,') rename=(incident_date=incident_date',_id,'_',n,' incident_number=incident_number',_id,'_',n,' incident_type=incident_type',_id,'_',n,' violation_type=incident',_id,'_',n,'))'));
 if last then call execute(';by id;drop _id n;run;');
run;

Xia Keshan

View solution in original post

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

You can very simply add an id just by sorting the dataset, then in a datastep retain your_id variable, and us by:

data want;

     set have;

     retain your_id;

     by id;

     if first.id then your_id=1;

     else your_id=your_id+1;

run;

This can then be proc transposed to get a wide dataset.

My question would be, why?  Is there are reason you need a dataset with 6000 columns, how are you planning on working with this data.  If you really have to have that kind of configuration, you will want to work with arrays, so you would probably be better off with a 2 dimensional array: array something{x,y} ....  I.e. x would be your incidence count, y would be the violation count within that incidence:

Angi
Obsidian | Level 7

Hello RW9, Thank you for your reply. I had tried the method you suggested but adding a regular id will not work. I have tried to clarify my question by including a want statement. The wide dataset will not have 6000 columns as the index variable is "violation_type". Any suggestions on how I can tackle this?

Thank you,

Angi

Ksharp
Super User

You only post the sample data , not the output you need yet .

Assuming I understood what you mean.

data have;
input id incident_date : date11. incident_number incident_type & $40. violation_type & $40.     ;
format incident_date  date11.;
cards;
1 15-Feb-09 94180  Drugs and Drug Paraphernalia   Violation of law
1 15-Feb-09 94180  Drugs and Drug Paraphernalia   Drugs and drug paraphernalia
2 27-Jul-09 91703  Alcohol   Alcohol
2 27-Jul-09 91703  Alcohol   Disorderly conduct
2 27-Jul-09 91703  Alcohol   Reckless endangerment
2 27-Jul-09 91703  Alcohol   Violation of law
2 14-Sep-09 92335  Alcohol   Violation of law
2 14-Sep-09 92335  Alcohol   Alcohol
;
run;
data have;
 set have;
 by   id incident_date  incident_number;
 if first.id then a=0;
 a+first.incident_number;
 if first.incident_number then b=0;
 b+1;
run;
proc sql noprint;
 select distinct catt('have(where=(a=',a,' and b=',b,') rename=(violation_type=incident',a,'_',b,'))') 
   into : list separated by ' '
    from have;
quit;
data want;
 merge &list ;
 by id incident_date  incident_number;
 drop a b incident_type;
run;

Xia Keshan

Angi
Obsidian | Level 7

Hi Xia Keshan,

Thank you for your reply. I have updated my question with a “want” dataset to clarify my question.

I copied the syntax you provided and ran it. I got the following error message:

23   proc sql noprint;

24    select distinct catt('have(where=(a=',a,' and b=',b,')

24 ! rename=(violation_type=incident',a,'_',b,'))')

25      into : list separated by ' '

26       from temp;

ERROR: File WORK.TEMP.DATA does not exist.

27   quit;

NOTE: The SAS System stopped processing this step because of errors.

NOTE: PROCEDURE SQL used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

28   data want;

29    merge &list ;

            -

            22

            200

WARNING: Apparent symbolic reference LIST not resolved.

ERROR: File WORK.LIST.DATA does not exist.

ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, ;, END,

              _DATA_, _LAST_, _NULL_.

ERROR 200-322: The symbol is not recognized and will be ignored.

30    by id incident_date  incident_number;

31    drop a b incident_type;

32   run;

WARNING: The variable a in the DROP, KEEP, or RENAME list has never been referenced.

WARNING: The variable b in the DROP, KEEP, or RENAME list has never been referenced.

WARNING: The variable incident_type in the DROP, KEEP, or RENAME list has never been referenced.

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WORK.WANT may be incomplete.  When this step was stopped there were 0

         observations and 0 variables.

NOTE: DATA statement used (Total process time):

      real time           0.03 seconds

      cpu time            0.00 seconds

I’d appreciate any suggestions you might have.

Thank you,

Angi

Ksharp
Super User

Sorry. My bad. Try this one :

data have;
input id incident_date : date11. incident_number incident_type & $40. violation_type & $40.     ;
format incident_date  date11.;
cards;
1 15-Feb-09 94180  Drugs and Drug Paraphernalia   Violation of law
1 15-Feb-09 94180  Drugs and Drug Paraphernalia   Drugs and drug paraphernalia
2 27-Jul-09 91703  Alcohol   Alcohol
2 27-Jul-09 91703  Alcohol   Disorderly conduct
2 27-Jul-09 91703  Alcohol   Reckless endangerment
2 27-Jul-09 91703  Alcohol   Violation of law
2 14-Sep-09 92335  Alcohol   Violation of law
2 14-Sep-09 92335  Alcohol   Alcohol
;
run;
data have;
 set have;
 by   id incident_date incident_number;
 if first.id then _id=0;
 _id+first.incident_number;
 if first.incident_number then n=0;
 n+1;
run;
proc sql ;
create table temp as
 select distinct _id,n from have;
quit;
data _null_;
 set temp end=last;
 if _n_ eq 1 then call execute('data want;merge ');
 call execute(catt('have(where=(n=',n,' and _id=',_id,') rename=(incident_date=incident_date',_id,'_',n,' incident_number=incident_number',_id,'_',n,' incident_type=incident_type',_id,'_',n,' violation_type=incident',_id,'_',n,'))'));
 if last then call execute(';by id;drop _id n;run;');
run;

Xia Keshan

Angi
Obsidian | Level 7

This is amazing! Thank you very much Smiley Happy

Jagadishkatam
Amethyst | Level 16

Hi Angi,

Please find below an alternative way of getting the same output, using arrays

data want;

set have;

by id;

if first.id then row=1;

else row+1;

run;

proc sql;

select max(row) into :row from want;

run;

%put &row;

data array;

  set want;

  format incident_date1-incident_date6  date9.;

  retain incident_date1-incident_date6 incident_number1-incident_number6 incident_type1-incident_type6 violation_type1-violation_type6;

  array sc(&row)  incident_date1-incident_date6;

array sc1(&row)  incident_number1-incident_number6;

array sc2(&row) $100 incident_type1-incident_type6;

array sc3(&row) $100 violation_type1-violation_type6;

by id;

if first.id then do;

do i = 1 to &row;

sc(i)=.;

sc1(i)=.;

sc2(i)='';

sc3(i)='';

end;

end;

sc(row)=incident_date;

sc1(row)=incident_number;

sc2(row)=incident_type;

sc3(row)=violation_type;

if last.id;

drop row i;

run;

Hope it helps.

Thanks,

Jag

Thanks,
Jag
Angi
Obsidian | Level 7

This is very helpful Jag. Thank you Smiley Happy

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1734 views
  • 4 likes
  • 4 in conversation