BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kc2
Quartz | Level 8 Kc2
Quartz | Level 8

Hello,

I have a dataset where I have multiple dates per subject on different rows.

I would like to have all the dates into one row. Each row has a number of other parameters.

have data look like this

subject      startdt             enddt                           topic               comments              location

1               1jan2015         31mar2015                 english             good                      buildA

1               1apr2015         30jun2015                 science             average                  LabB

2               1jan2015         28feb2015                  math                  bad                         mathlab

3               1jan2015         28feb2015                  math                good                        mathlab 

3               1mar2015        31mar2015              history                good                        buildA

3               1apr2015         30jun2015                 science             average                  LabB

4              1mar2015        31mar2015              history                good                        buildA

5               1mar2015        31mar2015              english                good                        buildA

5               1apr2015         30jun2015                 science             average                  LabB

 

 

I would like the data to look like

subject startdt1  enddt1  topic1    comments1   location1   startdt2    enddt2   topic2   comments2   location1 startdt3   enddt3      topic3  comments3   location3

1    1jan2015   31mar2015    english     good   buildA      1apr2015  30jun2015    science     average     LabB          

2    1jan2015   28feb2015     math        bad    mathlab

3    1jan2015  28feb2015       math       good   mathlab  1mar2015   31mar2015   history    good   buildA 1apr2015  30jun2015   science             average                  LabB

4    1mar2015   31mar2015   history  good    buildA

5    1mar2015  31mar2015   english   good    buildA     1apr2015   30jun2015     science   average    LabB

 

Thanks fo your help.

 

KC

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

The simplest way to do this is using proc means + idgroup .

Or if you care about efficiency  and more function, you can consider the MERGE skill ,me Matt and Arthur.T proposed .

http://support.sas.com/resources/papers/proceedings15/2785-2015.pdf

 

 

 

data have;
input subject      (startdt             enddt                           topic               comments           location) (: $20.);
cards;  
1               1jan2015         31mar2015                 english             good                      buildA
1               1apr2015         30jun2015                 science             average                  LabB
2               1jan2015         28feb2015                  math                  bad                         mathlab
3               1jan2015         28feb2015                  math                good                        mathlab 
3               1mar2015        31mar2015              history                good                        buildA
3               1apr2015         30jun2015                 science             average                  LabB
4              1mar2015        31mar2015              history                good                        buildA
5               1mar2015        31mar2015              english                good                        buildA
5               1apr2015         30jun2015                 science             average                  LabB
;
run;
proc sql;
select max(n) into : n
 from (select count(*) as n from have group by subject );
quit;
proc summary data=have;
by subject;
output out=want idgroup(out[&n] (startdt   enddt     topic   comments location)=);
run;

View solution in original post

9 REPLIES 9
Reeza
Super User

I would highly recommend you don't. As is, your data can be analyzed nicely to produc summary statistics or answer queries.

If you change the format it becomes harder. 

 

Ie common questions:

How many students took english?

How many students took english and math together?

How many students took english and liked the course?

 

If you change the format, answering each of the question above becomes much harder than it currently is. 

 

What type of analysis/reporting are you doing, where you think this would be helpful. 

ballardw
Super User

An example of what  @Reeza mentions:

 

proc print data=have;
   title "Enrolled in Feb";
   where month(startdt) le 2 le month(enddt);
   var subject startdt enddt topic;
run;title;

Will get a list of students and classes enrolled in Februrary. If you have the data on one line then you would have to search each pair of date values. And considering that one subjec could well be enrolled in more than 2 classes over any period of time you could end up with something so unmangeable (think 8 semesters and maybe 50+ classes within that time frame).

 

Kc2
Quartz | Level 8 Kc2
Quartz | Level 8

Hi Reeza, 

this is a mock dataset i created. The actual one I am working on is for a drug analyis. where the different dates corresponds to period of intake.

So I want to have all the intake period in one row and also their associated side effects.

 

I cannot post the actual dataset, evne a small sample to I created a mock one.

it would be something liek;

id, prd1st ,prd1end ,sideffect in prd1, prd2st ,prd2end, side effect in prd2.

 

Kc

Reeza
Super User

Since you're probably going to anyways, here's a macro that helps transpose the data.

 

http://www.sascommunity.org/wiki/A_Better_Way_to_Flip_(Transpose)_a_SAS_Dataset

Ksharp
Super User

The simplest way to do this is using proc means + idgroup .

Or if you care about efficiency  and more function, you can consider the MERGE skill ,me Matt and Arthur.T proposed .

http://support.sas.com/resources/papers/proceedings15/2785-2015.pdf

 

 

 

data have;
input subject      (startdt             enddt                           topic               comments           location) (: $20.);
cards;  
1               1jan2015         31mar2015                 english             good                      buildA
1               1apr2015         30jun2015                 science             average                  LabB
2               1jan2015         28feb2015                  math                  bad                         mathlab
3               1jan2015         28feb2015                  math                good                        mathlab 
3               1mar2015        31mar2015              history                good                        buildA
3               1apr2015         30jun2015                 science             average                  LabB
4              1mar2015        31mar2015              history                good                        buildA
5               1mar2015        31mar2015              english                good                        buildA
5               1apr2015         30jun2015                 science             average                  LabB
;
run;
proc sql;
select max(n) into : n
 from (select count(*) as n from have group by subject );
quit;
proc summary data=have;
by subject;
output out=want idgroup(out[&n] (startdt   enddt     topic   comments location)=);
run;
Kc2
Quartz | Level 8 Kc2
Quartz | Level 8

Before posting on the forum , I had try 2 methods;

method1

 proc sort  with  dupout option followed by a merge in a data step

proc sort data =have out= stdat dupout=endat nodupkey ; by id;run;

 

data want;

merge stdat endat;

by id;

run;

 

it worked except for the case where  the id has 3 dates.

 

method 2

Then I tried a second method; proc sort by id and start date and then using a data step with the first.id,last.id , not first.id and not last.id

proc sort data =have  ; by id stdate;run;

 

data want_st want_end want_m;

set have;

by id stdate;

if first.id then output want_st;

if not first.id and not last.id then output want_m;

if last.id then output want_end;

run;

 

I have the same problem with  the id which has 3 records.

 

Thank you for your answers I will check the macros tomorrow.

 

 

Kc

 

 

 

Kc2
Quartz | Level 8 Kc2
Quartz | Level 8

Forgot to mention that I tried the tranpose method as well before posting my question on the forum but becasue of the different side effects variable per period of drug intake  the tranpose is not working either.

FreelanceReinh
Jade | Level 19

Transposing data using PROC SUMMARY's IDGROUP option as suggested by @Ksharp is a great idea. You can find background information about this technique and comparisons to more traditional methods in these two articles:

Kc2
Quartz | Level 8 Kc2
Quartz | Level 8

Thank you everyone. Ksharp's solution worked.

Kc

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1788 views
  • 3 likes
  • 5 in conversation