Hello,
I have a dataset where I have multiple dates per subject on different rows.
I would like to have all the dates into one row. Each row has a number of other parameters.
have data look like this
subject startdt enddt topic comments location
1 1jan2015 31mar2015 english good buildA
1 1apr2015 30jun2015 science average LabB
2 1jan2015 28feb2015 math bad mathlab
3 1jan2015 28feb2015 math good mathlab
3 1mar2015 31mar2015 history good buildA
3 1apr2015 30jun2015 science average LabB
4 1mar2015 31mar2015 history good buildA
5 1mar2015 31mar2015 english good buildA
5 1apr2015 30jun2015 science average LabB
I would like the data to look like
subject startdt1 enddt1 topic1 comments1 location1 startdt2 enddt2 topic2 comments2 location1 startdt3 enddt3 topic3 comments3 location3
1 1jan2015 31mar2015 english good buildA 1apr2015 30jun2015 science average LabB
2 1jan2015 28feb2015 math bad mathlab
3 1jan2015 28feb2015 math good mathlab 1mar2015 31mar2015 history good buildA 1apr2015 30jun2015 science average LabB
4 1mar2015 31mar2015 history good buildA
5 1mar2015 31mar2015 english good buildA 1apr2015 30jun2015 science average LabB
Thanks fo your help.
KC
The simplest way to do this is using proc means + idgroup .
Or if you care about efficiency and more function, you can consider the MERGE skill ,me Matt and Arthur.T proposed .
http://support.sas.com/resources/papers/proceedings15/2785-2015.pdf
data have;
input subject (startdt enddt topic comments location) (: $20.);
cards;
1 1jan2015 31mar2015 english good buildA
1 1apr2015 30jun2015 science average LabB
2 1jan2015 28feb2015 math bad mathlab
3 1jan2015 28feb2015 math good mathlab
3 1mar2015 31mar2015 history good buildA
3 1apr2015 30jun2015 science average LabB
4 1mar2015 31mar2015 history good buildA
5 1mar2015 31mar2015 english good buildA
5 1apr2015 30jun2015 science average LabB
;
run;
proc sql;
select max(n) into : n
from (select count(*) as n from have group by subject );
quit;
proc summary data=have;
by subject;
output out=want idgroup(out[&n] (startdt enddt topic comments location)=);
run;
I would highly recommend you don't. As is, your data can be analyzed nicely to produc summary statistics or answer queries.
If you change the format it becomes harder.
Ie common questions:
How many students took english?
How many students took english and math together?
How many students took english and liked the course?
If you change the format, answering each of the question above becomes much harder than it currently is.
What type of analysis/reporting are you doing, where you think this would be helpful.
An example of what @Reeza mentions:
proc print data=have;
title "Enrolled in Feb";
where month(startdt) le 2 le month(enddt);
var subject startdt enddt topic;
run;title;
Will get a list of students and classes enrolled in Februrary. If you have the data on one line then you would have to search each pair of date values. And considering that one subjec could well be enrolled in more than 2 classes over any period of time you could end up with something so unmangeable (think 8 semesters and maybe 50+ classes within that time frame).
Hi Reeza,
this is a mock dataset i created. The actual one I am working on is for a drug analyis. where the different dates corresponds to period of intake.
So I want to have all the intake period in one row and also their associated side effects.
I cannot post the actual dataset, evne a small sample to I created a mock one.
it would be something liek;
id, prd1st ,prd1end ,sideffect in prd1, prd2st ,prd2end, side effect in prd2.
Kc
Since you're probably going to anyways, here's a macro that helps transpose the data.
http://www.sascommunity.org/wiki/A_Better_Way_to_Flip_(Transpose)_a_SAS_Dataset
The simplest way to do this is using proc means + idgroup .
Or if you care about efficiency and more function, you can consider the MERGE skill ,me Matt and Arthur.T proposed .
http://support.sas.com/resources/papers/proceedings15/2785-2015.pdf
data have;
input subject (startdt enddt topic comments location) (: $20.);
cards;
1 1jan2015 31mar2015 english good buildA
1 1apr2015 30jun2015 science average LabB
2 1jan2015 28feb2015 math bad mathlab
3 1jan2015 28feb2015 math good mathlab
3 1mar2015 31mar2015 history good buildA
3 1apr2015 30jun2015 science average LabB
4 1mar2015 31mar2015 history good buildA
5 1mar2015 31mar2015 english good buildA
5 1apr2015 30jun2015 science average LabB
;
run;
proc sql;
select max(n) into : n
from (select count(*) as n from have group by subject );
quit;
proc summary data=have;
by subject;
output out=want idgroup(out[&n] (startdt enddt topic comments location)=);
run;
Before posting on the forum , I had try 2 methods;
method1
proc sort with dupout option followed by a merge in a data step
proc sort data =have out= stdat dupout=endat nodupkey ; by id;run;
data want;
merge stdat endat;
by id;
run;
it worked except for the case where the id has 3 dates.
method 2
Then I tried a second method; proc sort by id and start date and then using a data step with the first.id,last.id , not first.id and not last.id
proc sort data =have ; by id stdate;run;
data want_st want_end want_m;
set have;
by id stdate;
if first.id then output want_st;
if not first.id and not last.id then output want_m;
if last.id then output want_end;
run;
I have the same problem with the id which has 3 records.
Thank you for your answers I will check the macros tomorrow.
Kc
Forgot to mention that I tried the tranpose method as well before posting my question on the forum but becasue of the different side effects variable per period of drug intake the tranpose is not working either.
Transposing data using PROC SUMMARY's IDGROUP option as suggested by @Ksharp is a great idea. You can find background information about this technique and comparisons to more traditional methods in these two articles:
Thank you everyone. Ksharp's solution worked.
Kc
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.