Solved: Re: Retain + Looping among multiple row records

LLW · Posted 04-06-2015 08:33 PM

Hi everyone,

I'm a student who is relatively new to SAS, and I haven't been able to solve this problem. Thank you in advance!! I have a dataset composed of a couple hundred thousand entries grouped by ID. I want to be able to use the retain function by id and for each of the entries.

An example of my dataset:

ID	DaystoEvent
A	100
A	301
A	499
B	200
B	250
C	100


What I would like in my dataset:

ID	DaystoEvent	1_DaystoEvent	2_DaystoEvent	3_DaystoEvent
A	100	100	301	499
A	301	100	301	499
A	499	100	301	499
B	200	200	250	.
B	250	200	250	.
C	100	100	.	.

Basically, I'd like to use the "retain" function for the variable DaystoEvent for ALL the row records by "ID". I cannot create a simple array because each "ID" has a different number of entries, and my dataset is over 300,000 "Ids". I have figured out how to use the retain function for the first.ID (example below), but I cannot figure out how to do it for all records by ID.

data work;

set work;

by I

Retain R_Daystoevent;

If first.ID then do;

R_Daystoevent=Daystoevent;

end;

Thank you!

Louise

naveen_srini · Posted 04-07-2015 12:44 AM

data work;

input ID $ DaystoEvent;

cards;

A 100

A 301

A 499

B 200

B 250

C 100

;

proc sql; /* copied from Reeza's*/
select max(count) into :max_count separated by ' ' from
(select count(ID) as count from work
group by ID);
quit;

data want;
do _n_=1 by 1 until(last.id);
set work;
by id;
array days(*) Daystoevent1-Daystoevent&max_count;
days(_n_)=DaystoEvent;
end;

do until(last.id);
set work;
by id;
output;
end;
run;

Regards,

Naveen Srinivasan

L&T Infotech

View solution in original post

Reeza · Posted 04-06-2015 08:52 PM

Do you absolutely need the second column?

If not proc transpose is a direct solution.

I also recommend naming your variables Days_to_event1 Days_to_event2. This way you can use the shortcut reference for variables if required.

proc sort data=have; by ID daystoEvent; run;

proc transpose data=have out=want prefix=Days_to_event;

by ID;

var DaysToEvent;

run;

LLW · Posted 04-06-2015 09:12 PM

Hi Fareeza,

Thank you so much--really appreciate it. Unfortunately, I need to keep the second variable (DaysToEvent) because I need to take the difference between DaytoEvent_1- DaystoEvent, DaystoEvent_2-DaystoEvent, etc. Is there a way to keep the DaystoEvent column?

Thank you so much,

Louise

Reeza · Posted 04-06-2015 09:31 PM

I think this is the same: DaysToEvent(n)-daystoevent(1) be the same as DaysToEvent(n)-DaysToEvent?

*Get max count of records;

proc sql;

select max(count) into :max_count from

(select count(ID) as count from Have

group by ID);

quit;

*Transpose;

proc sort data=have; by ID daystoEvent; run;

proc transpose data=have out=want prefix=Days_to_event;

by ID;

var DaysToEvent;

run;

*calculate differences;

data want2;

set want;

array days2(*) days_to_event1-days_to_event&max_count;

array diff(*) days_diff1-days_diff&max_count;

do i=1 to dim(days2);

diff(i)=days2(i)-days2(1);

end;

run;

Ksharp · Posted 04-07-2015 09:06 AM

Did I miss something ? Why not just a simple MERGE ?

data have;
  input ID $ DaystoEvent;
  cards;
A 100
A 301
A 499
B 200
B 250
C 100
;
run;

proc transpose data=have out=temp(drop=_name_) prefix=Days_to_event;
by ID;
var DaysToEvent;
run;
data want;
 merge have temp;
 by id;
run;

Xia Keshan

art297 · Posted 04-06-2015 08:54 PM

Similar to Fareeza's code, but producing the extra lines as well:

data work;

input ID $ DaystoEvent;

cards;

A 100

A 301

A 499

B 200

B 250

C 100

;

proc transpose data=work out=need(drop=_:) prefix=DaystoEvent;

by id;

run;

data want (drop=counter);

length id $5;

format daystoevent best12.;

set need;

by ID;

array days(3) DaystoEvent1-DaystoEvent3;

counter=1;

do while(counter le 3 and not missing(days(counter)));

Daystoevent=Days(counter);

counter+1;

output;

end;

run;

LLW · Posted 04-06-2015 09:19 PM

Hi Arthur,

Thank you so much! Is there a way to make the array number for DaystoEvent a variable as opposed to just 3? This is because in my sample data, I know that the largest number of records per ID is 3. However because I have such a large dataset (300,000 IDs) with varied #'s of records per ID, I'm not sure sure how many variables of DaystoEvent I will need for the array in the actual dataset.

Thank you so much,

Louise

art297 · Posted 04-06-2015 09:27 PM

You could use:

data work;

input ID $ DaystoEvent;

cards;

A 100

A 301

A 499

B 200

B 250

C 100

;

proc transpose data=work out=need(drop=_:) prefix=DaystoEvent_;

by id;

run;

data want (drop=counter);

length id $5;

format daystoevent best12.;

set need;

by ID;

array days(*) DaystoEvent_:;

counter=1;

do while(counter le dim(days) and not missing(days(counter)));

Daystoevent=Days(counter);

counter+1;

output;

end;

run;

naveen_srini · Posted 04-07-2015 12:44 AM

data work;

input ID $ DaystoEvent;

cards;

A 100

A 301

A 499

B 200

B 250

C 100

;

proc sql; /* copied from Reeza's*/
select max(count) into :max_count separated by ' ' from
(select count(ID) as count from work
group by ID);
quit;

data want;
do _n_=1 by 1 until(last.id);
set work;
by id;
array days(*) Daystoevent1-Daystoevent&max_count;
days(_n_)=DaystoEvent;
end;

do until(last.id);
set work;
by id;
output;
end;
run;

Regards,

Naveen Srinivasan

L&T Infotech

RW9 · Posted 04-07-2015 04:47 AM

Hi,

Sorry, still not clear why you need a transposed dataset. As you state, there could be many per id, so you could end up with thousands of columns.

Why not just assign a number in the dataset, then using merging to get days from:

data have;

input ID $ DaystoEvent;

datalines;

A 100

A 301

A 499

B 200

B 250

C 100

;

run;

data have;

set have;

by id;

retain no;

if _n_=1 or first.id then no=1;

else no=no+1;

run;

proc sql;

create table WANT as

select A.ID,

A.NO,

A.DAYSTOEVENT,

A.DAYSTOEVENT - B.DAYSTOEVENT as RES

from WORK.HAVE A

left join (select * from WORK.HAVE where NO=1) B

on A.ID=B.ID

order by A.ID,

A.NO;

quit;

Astounding · Posted 04-07-2015 09:35 AM

At least 90% of the time, this sort of restructuring of the data is not necessary. If you were to post a different question, describing the analysis you would like to perform, you would likely receive many answers that utilize your original structure to the data.

Good luck.

LLW · Posted 04-07-2015 10:35 AM

Hi everyone!

This was my first time posting a question, and thank you to everyone for your help. I have received so many different ways of getting what I wanted. Part of the reason that I wanted all the "DaysToEvents" in columns was because I need to find the difference between all combinations of the difference of DaystoEvents by ID, and I thought that the best way to take differences was to use columns. But Astounding is right, I will have very many columns of new DaystoEvents (233 to be exact!).

I will think about how to move forward with the analysis that I want using this data structure, given the large columns.

Thank you to everyone!

Louise

Retain + Looping among multiple row records

Re: Retain + Looping among multiple row records

Re: Retain + Looping among multiple row records

Re: Retain + Looping among multiple row records

Re: Retain + Looping among multiple row records

Re: Retain + Looping among multiple row records

Re: Retain + Looping among multiple row records

Re: Retain + Looping among multiple row records

Re: Retain + Looping among multiple row records

Re: Retain + Looping among multiple row records

Re: Retain + Looping among multiple row records

Re: Retain + Looping among multiple row records

Re: Retain + Looping among multiple row records

Registration is open

Call for Content EXTENDED

Registration is open

Call for Content EXTENDED

SAS Training: Just a Click Away