BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LLW
Fluorite | Level 6 LLW
Fluorite | Level 6

Hi everyone,

I'm a student who is relatively new to SAS, and I haven't been able to solve this problem. Thank you in advance!! I have a dataset composed of a couple hundred thousand entries grouped by ID. I want to be able to use the retain function by id and for each of the entries.

An example of my dataset:

IDDaystoEvent
A100
A301
A499
B200
B250
C100

What I would like in my dataset:
IDDaystoEvent1_DaystoEvent2_DaystoEvent3_DaystoEvent
A100100301499
A301100301499
A499100301499
B200200250.
B250200250.
C100100..

Basically, I'd like to use the "retain" function for the variable DaystoEvent for ALL the row records by "ID". I cannot create a simple array because each "ID" has a different number of entries, and my dataset is over 300,000 "Ids". I have figured out how to use the retain function for the first.ID (example below), but I cannot figure out how to do it for all records by ID.

data work;

set work;

by I

Retain R_Daystoevent;

If first.ID then do;

R_Daystoevent=Daystoevent;

end;

Thank you!

Louise

1 ACCEPTED SOLUTION

Accepted Solutions
naveen_srini
Quartz | Level 8

data work;

  input ID $ DaystoEvent;

  cards;

A 100

A 301

A 499

B 200

B 250

C 100

;


proc sql; /* copied from Reeza's*/
select max(count) into :max_count separated by ' ' from
(select count(ID) as count from work
  group by ID);
quit;


data want;
do _n_=1 by 1 until(last.id);
  set work;
  by id;
  array days(*) Daystoevent1-Daystoevent&max_count;
  days(_n_)=DaystoEvent;
end;

do until(last.id);
  set work;
  by id;
  output;
end;
run;

Regards,

Naveen Srinivasan

L&T Infotech

View solution in original post

11 REPLIES 11
Reeza
Super User

Do you absolutely need the second column?

If not proc transpose is a direct solution.


I also recommend naming your variables Days_to_event1 Days_to_event2. This way you can use the shortcut reference for variables if required.


proc sort data=have; by ID daystoEvent; run;


proc transpose data=have out=want prefix=Days_to_event;

by ID;

var DaysToEvent;

run;

LLW
Fluorite | Level 6 LLW
Fluorite | Level 6

Hi Fareeza,

Thank you so much--really appreciate it. Unfortunately, I need to keep the second variable (DaysToEvent) because I need to take the difference between DaytoEvent_1- DaystoEvent, DaystoEvent_2-DaystoEvent, etc. Is there a way to keep the DaystoEvent column?

Thank you so much,

Louise

Reeza
Super User

I think this is the same: DaysToEvent(n)-daystoevent(1) be the same  as DaysToEvent(n)-DaysToEvent?


*Get max count of records;

proc sql;

select max(count) into :max_count from

(select count(ID) as count from Have

group by ID);

quit;


*Transpose;

proc sort data=have; by ID daystoEvent; run;


proc transpose data=have out=want prefix=Days_to_event;

by ID;

var DaysToEvent;

run;


*calculate differences;


data want2;

set want;

array days2(*) days_to_event1-days_to_event&max_count;

array diff(*) days_diff1-days_diff&max_count;


do i=1 to dim(days2);

diff(i)=days2(i)-days2(1);

end;

run;

Ksharp
Super User

Did I miss something ? Why not just a simple MERGE ?

data have;
  input ID $ DaystoEvent;
  cards;
A 100
A 301
A 499
B 200
B 250
C 100
;
run;

proc transpose data=have out=temp(drop=_name_) prefix=Days_to_event;
by ID;
var DaysToEvent;
run;
data want;
 merge have temp;
 by id;
run;

Xia Keshan

art297
Opal | Level 21

Similar to Fareeza's code, but producing the extra lines as well:

data work;

  input ID $ DaystoEvent;

  cards;

A 100

A 301

A 499

B 200

B 250

C 100

;

proc transpose data=work out=need(drop=_:) prefix=DaystoEvent;

  by id;

run;

data want (drop=counter);

  length id $5;

  format daystoevent best12.;

  set need;

  by ID;

  array days(3) DaystoEvent1-DaystoEvent3;

  counter=1;

  do while(counter le 3 and not missing(days(counter)));

    Daystoevent=Days(counter);

    counter+1;

    output;

  end;

run;

LLW
Fluorite | Level 6 LLW
Fluorite | Level 6

Hi Arthur,

Thank you so much! Is there a way to make the array number for DaystoEvent a variable as opposed to just 3? This is because in my sample data, I know that the largest number of records per ID is 3. However because I have such a large dataset (300,000 IDs) with varied #'s of records per ID, I'm not sure sure how many variables of DaystoEvent I will need for the array in the actual dataset.

Thank you so much,

Louise

art297
Opal | Level 21

You could use:

data work;

  input ID $ DaystoEvent;

  cards;

A 100

A 301

A 499

B 200

B 250

C 100

;

proc transpose data=work out=need(drop=_:) prefix=DaystoEvent_;

  by id;

run;

data want (drop=counter);

  length id $5;

  format daystoevent best12.;

  set need;

  by ID;

  array days(*) DaystoEvent_:;

  counter=1;

  do while(counter le dim(days) and not missing(days(counter)));

    Daystoevent=Days(counter);

    counter+1;

    output;

  end;

run;

naveen_srini
Quartz | Level 8

data work;

  input ID $ DaystoEvent;

  cards;

A 100

A 301

A 499

B 200

B 250

C 100

;


proc sql; /* copied from Reeza's*/
select max(count) into :max_count separated by ' ' from
(select count(ID) as count from work
  group by ID);
quit;


data want;
do _n_=1 by 1 until(last.id);
  set work;
  by id;
  array days(*) Daystoevent1-Daystoevent&max_count;
  days(_n_)=DaystoEvent;
end;

do until(last.id);
  set work;
  by id;
  output;
end;
run;

Regards,

Naveen Srinivasan

L&T Infotech

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

Sorry, still not clear why you need a transposed dataset.  As you state, there could be many per id, so you could end up with thousands of columns.

Why not just assign a number in the dataset, then using merging to get days from:

data have;

  input ID $ DaystoEvent;

datalines;

A 100

A 301

A 499

B 200

B 250

C 100

;

run;

data have;

  set have;

  by id;

  retain no;

  if _n_=1 or first.id then no=1;

  else no=no+1;

run;

proc sql;

  create table WANT as

  select  A.ID,

          A.NO,

          A.DAYSTOEVENT,

          A.DAYSTOEVENT - B.DAYSTOEVENT as RES

  from    WORK.HAVE A

  left join (select * from WORK.HAVE where NO=1) B

  on      A.ID=B.ID

  order by A.ID,

           A.NO;

quit;

Astounding
PROC Star

At least 90% of the time, this sort of restructuring of the data is not necessary.  If you were to post a different question, describing the analysis you would like to perform, you would likely receive many answers that utilize your original structure to the data.

Good luck.

LLW
Fluorite | Level 6 LLW
Fluorite | Level 6

Hi everyone!

This was my first time posting a question, and thank you to everyone for your help. I have received so many different ways of getting what I wanted. Part of the reason that I wanted all the "DaysToEvents" in columns was because I need to find the difference between all combinations of the difference of DaystoEvents by ID, and I thought that the best way to take differences was to use columns. But Astounding is right, I will have very many columns of new DaystoEvents (233 to be exact!).

I will think about how to move forward with the analysis that I want using this data structure, given the large columns.

Thank you to everyone!

Louise

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1790 views
  • 10 likes
  • 7 in conversation