BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LLW
Fluorite | Level 6 LLW
Fluorite | Level 6

Hi everyone,

I'm a student who is relatively new to SAS, and I haven't been able to solve this problem. Thank you in advance!! I have a dataset composed of a couple hundred thousand entries grouped by ID. I want to be able to use the retain function by id and for each of the entries.

An example of my dataset:

IDDaystoEvent
A100
A301
A499
B200
B250
C100

What I would like in my dataset:
IDDaystoEvent1_DaystoEvent2_DaystoEvent3_DaystoEvent
A100100301499
A301100301499
A499100301499
B200200250.
B250200250.
C100100..

Basically, I'd like to use the "retain" function for the variable DaystoEvent for ALL the row records by "ID". I cannot create a simple array because each "ID" has a different number of entries, and my dataset is over 300,000 "Ids". I have figured out how to use the retain function for the first.ID (example below), but I cannot figure out how to do it for all records by ID.

data work;

set work;

by I

Retain R_Daystoevent;

If first.ID then do;

R_Daystoevent=Daystoevent;

end;

Thank you!

Louise

1 ACCEPTED SOLUTION

Accepted Solutions
naveen_srini
Quartz | Level 8

data work;

  input ID $ DaystoEvent;

  cards;

A 100

A 301

A 499

B 200

B 250

C 100

;


proc sql; /* copied from Reeza's*/
select max(count) into :max_count separated by ' ' from
(select count(ID) as count from work
  group by ID);
quit;


data want;
do _n_=1 by 1 until(last.id);
  set work;
  by id;
  array days(*) Daystoevent1-Daystoevent&max_count;
  days(_n_)=DaystoEvent;
end;

do until(last.id);
  set work;
  by id;
  output;
end;
run;

Regards,

Naveen Srinivasan

L&T Infotech

View solution in original post

11 REPLIES 11
Reeza
Super User

Do you absolutely need the second column?

If not proc transpose is a direct solution.


I also recommend naming your variables Days_to_event1 Days_to_event2. This way you can use the shortcut reference for variables if required.


proc sort data=have; by ID daystoEvent; run;


proc transpose data=have out=want prefix=Days_to_event;

by ID;

var DaysToEvent;

run;

LLW
Fluorite | Level 6 LLW
Fluorite | Level 6

Hi Fareeza,

Thank you so much--really appreciate it. Unfortunately, I need to keep the second variable (DaysToEvent) because I need to take the difference between DaytoEvent_1- DaystoEvent, DaystoEvent_2-DaystoEvent, etc. Is there a way to keep the DaystoEvent column?

Thank you so much,

Louise

Reeza
Super User

I think this is the same: DaysToEvent(n)-daystoevent(1) be the same  as DaysToEvent(n)-DaysToEvent?


*Get max count of records;

proc sql;

select max(count) into :max_count from

(select count(ID) as count from Have

group by ID);

quit;


*Transpose;

proc sort data=have; by ID daystoEvent; run;


proc transpose data=have out=want prefix=Days_to_event;

by ID;

var DaysToEvent;

run;


*calculate differences;


data want2;

set want;

array days2(*) days_to_event1-days_to_event&max_count;

array diff(*) days_diff1-days_diff&max_count;


do i=1 to dim(days2);

diff(i)=days2(i)-days2(1);

end;

run;

Ksharp
Super User

Did I miss something ? Why not just a simple MERGE ?

data have;
  input ID $ DaystoEvent;
  cards;
A 100
A 301
A 499
B 200
B 250
C 100
;
run;

proc transpose data=have out=temp(drop=_name_) prefix=Days_to_event;
by ID;
var DaysToEvent;
run;
data want;
 merge have temp;
 by id;
run;

Xia Keshan

art297
Opal | Level 21

Similar to Fareeza's code, but producing the extra lines as well:

data work;

  input ID $ DaystoEvent;

  cards;

A 100

A 301

A 499

B 200

B 250

C 100

;

proc transpose data=work out=need(drop=_:) prefix=DaystoEvent;

  by id;

run;

data want (drop=counter);

  length id $5;

  format daystoevent best12.;

  set need;

  by ID;

  array days(3) DaystoEvent1-DaystoEvent3;

  counter=1;

  do while(counter le 3 and not missing(days(counter)));

    Daystoevent=Days(counter);

    counter+1;

    output;

  end;

run;

LLW
Fluorite | Level 6 LLW
Fluorite | Level 6

Hi Arthur,

Thank you so much! Is there a way to make the array number for DaystoEvent a variable as opposed to just 3? This is because in my sample data, I know that the largest number of records per ID is 3. However because I have such a large dataset (300,000 IDs) with varied #'s of records per ID, I'm not sure sure how many variables of DaystoEvent I will need for the array in the actual dataset.

Thank you so much,

Louise

art297
Opal | Level 21

You could use:

data work;

  input ID $ DaystoEvent;

  cards;

A 100

A 301

A 499

B 200

B 250

C 100

;

proc transpose data=work out=need(drop=_:) prefix=DaystoEvent_;

  by id;

run;

data want (drop=counter);

  length id $5;

  format daystoevent best12.;

  set need;

  by ID;

  array days(*) DaystoEvent_:;

  counter=1;

  do while(counter le dim(days) and not missing(days(counter)));

    Daystoevent=Days(counter);

    counter+1;

    output;

  end;

run;

naveen_srini
Quartz | Level 8

data work;

  input ID $ DaystoEvent;

  cards;

A 100

A 301

A 499

B 200

B 250

C 100

;


proc sql; /* copied from Reeza's*/
select max(count) into :max_count separated by ' ' from
(select count(ID) as count from work
  group by ID);
quit;


data want;
do _n_=1 by 1 until(last.id);
  set work;
  by id;
  array days(*) Daystoevent1-Daystoevent&max_count;
  days(_n_)=DaystoEvent;
end;

do until(last.id);
  set work;
  by id;
  output;
end;
run;

Regards,

Naveen Srinivasan

L&T Infotech

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

Sorry, still not clear why you need a transposed dataset.  As you state, there could be many per id, so you could end up with thousands of columns.

Why not just assign a number in the dataset, then using merging to get days from:

data have;

  input ID $ DaystoEvent;

datalines;

A 100

A 301

A 499

B 200

B 250

C 100

;

run;

data have;

  set have;

  by id;

  retain no;

  if _n_=1 or first.id then no=1;

  else no=no+1;

run;

proc sql;

  create table WANT as

  select  A.ID,

          A.NO,

          A.DAYSTOEVENT,

          A.DAYSTOEVENT - B.DAYSTOEVENT as RES

  from    WORK.HAVE A

  left join (select * from WORK.HAVE where NO=1) B

  on      A.ID=B.ID

  order by A.ID,

           A.NO;

quit;

Astounding
PROC Star

At least 90% of the time, this sort of restructuring of the data is not necessary.  If you were to post a different question, describing the analysis you would like to perform, you would likely receive many answers that utilize your original structure to the data.

Good luck.

LLW
Fluorite | Level 6 LLW
Fluorite | Level 6

Hi everyone!

This was my first time posting a question, and thank you to everyone for your help. I have received so many different ways of getting what I wanted. Part of the reason that I wanted all the "DaysToEvents" in columns was because I need to find the difference between all combinations of the difference of DaystoEvents by ID, and I thought that the best way to take differences was to use columns. But Astounding is right, I will have very many columns of new DaystoEvents (233 to be exact!).

I will think about how to move forward with the analysis that I want using this data structure, given the large columns.

Thank you to everyone!

Louise

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 3638 views
  • 10 likes
  • 7 in conversation