Hi everyone,
I'm a student who is relatively new to SAS, and I haven't been able to solve this problem. Thank you in advance!! I have a dataset composed of a couple hundred thousand entries grouped by ID. I want to be able to use the retain function by id and for each of the entries.
An example of my dataset:
ID | DaystoEvent |
A | 100 |
A | 301 |
A | 499 |
B | 200 |
B | 250 |
C | 100 |
What I would like in my dataset: |
ID | DaystoEvent | 1_DaystoEvent | 2_DaystoEvent | 3_DaystoEvent |
A | 100 | 100 | 301 | 499 |
A | 301 | 100 | 301 | 499 |
A | 499 | 100 | 301 | 499 |
B | 200 | 200 | 250 | . |
B | 250 | 200 | 250 | . |
C | 100 | 100 | . | . |
Basically, I'd like to use the "retain" function for the variable DaystoEvent for ALL the row records by "ID". I cannot create a simple array because each "ID" has a different number of entries, and my dataset is over 300,000 "Ids". I have figured out how to use the retain function for the first.ID (example below), but I cannot figure out how to do it for all records by ID.
data work;
set work;
by I
Retain R_Daystoevent;
If first.ID then do;
R_Daystoevent=Daystoevent;
end;
Thank you!
Louise
data work;
input ID $ DaystoEvent;
cards;
A 100
A 301
A 499
B 200
B 250
C 100
;
proc sql; /* copied from Reeza's*/
select max(count) into :max_count separated by ' ' from
(select count(ID) as count from work
group by ID);
quit;
data want;
do _n_=1 by 1 until(last.id);
set work;
by id;
array days(*) Daystoevent1-Daystoevent&max_count;
days(_n_)=DaystoEvent;
end;
do until(last.id);
set work;
by id;
output;
end;
run;
Regards,
Naveen Srinivasan
L&T Infotech
Do you absolutely need the second column?
If not proc transpose is a direct solution.
I also recommend naming your variables Days_to_event1 Days_to_event2. This way you can use the shortcut reference for variables if required.
proc sort data=have; by ID daystoEvent; run;
proc transpose data=have out=want prefix=Days_to_event;
by ID;
var DaysToEvent;
run;
Hi Fareeza,
Thank you so much--really appreciate it. Unfortunately, I need to keep the second variable (DaysToEvent) because I need to take the difference between DaytoEvent_1- DaystoEvent, DaystoEvent_2-DaystoEvent, etc. Is there a way to keep the DaystoEvent column?
Thank you so much,
Louise
I think this is the same: DaysToEvent(n)-daystoevent(1) be the same as DaysToEvent(n)-DaysToEvent?
*Get max count of records;
proc sql;
select max(count) into :max_count from
(select count(ID) as count from Have
group by ID);
quit;
*Transpose;
proc sort data=have; by ID daystoEvent; run;
proc transpose data=have out=want prefix=Days_to_event;
by ID;
var DaysToEvent;
run;
*calculate differences;
data want2;
set want;
array days2(*) days_to_event1-days_to_event&max_count;
array diff(*) days_diff1-days_diff&max_count;
do i=1 to dim(days2);
diff(i)=days2(i)-days2(1);
end;
run;
Did I miss something ? Why not just a simple MERGE ?
data have; input ID $ DaystoEvent; cards; A 100 A 301 A 499 B 200 B 250 C 100 ; run; proc transpose data=have out=temp(drop=_name_) prefix=Days_to_event; by ID; var DaysToEvent; run; data want; merge have temp; by id; run;
Xia Keshan
Similar to Fareeza's code, but producing the extra lines as well:
data work;
input ID $ DaystoEvent;
cards;
A 100
A 301
A 499
B 200
B 250
C 100
;
proc transpose data=work out=need(drop=_:) prefix=DaystoEvent;
by id;
run;
data want (drop=counter);
length id $5;
format daystoevent best12.;
set need;
by ID;
array days(3) DaystoEvent1-DaystoEvent3;
counter=1;
do while(counter le 3 and not missing(days(counter)));
Daystoevent=Days(counter);
counter+1;
output;
end;
run;
Hi Arthur,
Thank you so much! Is there a way to make the array number for DaystoEvent a variable as opposed to just 3? This is because in my sample data, I know that the largest number of records per ID is 3. However because I have such a large dataset (300,000 IDs) with varied #'s of records per ID, I'm not sure sure how many variables of DaystoEvent I will need for the array in the actual dataset.
Thank you so much,
Louise
You could use:
data work;
input ID $ DaystoEvent;
cards;
A 100
A 301
A 499
B 200
B 250
C 100
;
proc transpose data=work out=need(drop=_:) prefix=DaystoEvent_;
by id;
run;
data want (drop=counter);
length id $5;
format daystoevent best12.;
set need;
by ID;
array days(*) DaystoEvent_:;
counter=1;
do while(counter le dim(days) and not missing(days(counter)));
Daystoevent=Days(counter);
counter+1;
output;
end;
run;
data work;
input ID $ DaystoEvent;
cards;
A 100
A 301
A 499
B 200
B 250
C 100
;
proc sql; /* copied from Reeza's*/
select max(count) into :max_count separated by ' ' from
(select count(ID) as count from work
group by ID);
quit;
data want;
do _n_=1 by 1 until(last.id);
set work;
by id;
array days(*) Daystoevent1-Daystoevent&max_count;
days(_n_)=DaystoEvent;
end;
do until(last.id);
set work;
by id;
output;
end;
run;
Regards,
Naveen Srinivasan
L&T Infotech
Hi,
Sorry, still not clear why you need a transposed dataset. As you state, there could be many per id, so you could end up with thousands of columns.
Why not just assign a number in the dataset, then using merging to get days from:
data have;
input ID $ DaystoEvent;
datalines;
A 100
A 301
A 499
B 200
B 250
C 100
;
run;
data have;
set have;
by id;
retain no;
if _n_=1 or first.id then no=1;
else no=no+1;
run;
proc sql;
create table WANT as
select A.ID,
A.NO,
A.DAYSTOEVENT,
A.DAYSTOEVENT - B.DAYSTOEVENT as RES
from WORK.HAVE A
left join (select * from WORK.HAVE where NO=1) B
on A.ID=B.ID
order by A.ID,
A.NO;
quit;
At least 90% of the time, this sort of restructuring of the data is not necessary. If you were to post a different question, describing the analysis you would like to perform, you would likely receive many answers that utilize your original structure to the data.
Good luck.
Hi everyone!
This was my first time posting a question, and thank you to everyone for your help. I have received so many different ways of getting what I wanted. Part of the reason that I wanted all the "DaysToEvents" in columns was because I need to find the difference between all combinations of the difference of DaystoEvents by ID, and I thought that the best way to take differences was to use columns. But Astounding is right, I will have very many columns of new DaystoEvents (233 to be exact!).
I will think about how to move forward with the analysis that I want using this data structure, given the large columns.
Thank you to everyone!
Louise
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.