BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Iron_grief
Calcite | Level 5

Greetings Community,

I would like help with the following issue: is there a way to create arrays when working with datasets in long format?

I usually work with a wide dataset with the following configuration:

data wide;

input id time1 time2 time3 group1 group2 group3 outcome1 outcome2 outcome3;

cards;

1 20 30 40 0 1 0 0 1 0

2 10 15 20 0 0 1 0 0 0

;

On this dataset, I perform time-dependent survival analysis and I am able to censor cases which had developed in previous time periods (i.e., in time period 3, subject 1 would be excluded since he developed the outcome in time period 2). I manage to do this with arrays.

Now, I am working with an Access-derived dataset, which is formatted in long fashion like this:

data long;

input id visit time group outcome;

cards;

1 1 20 0 0

1 2 30 1 1

1 3 40 0 0

2 1 10 0 0

2 2 15 0 0

2 3 20 1 0

;

How would you group the set of outcomes from the same patient, in example to exclude the third observation (and any other after an outcome has occurres, for instance)?

Many thanks,

Manuel

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Manuel,

SAS has some standard tools, that I highly recommend you learn about.  They begin with understanding the BY statement within a DATA step.  Here's an example (untested code):

proc sort data=long;

   by id visit;

run;

data want;

   set long;

   by id visit;

   if first.id then delete_the_rest='N';

   retain delete_the_rest;

   if delete_the_rest='Y' then delete;

   if outcome=1 then delete_the_rest='Y';

run;

Just a few notes ...  The PROC SORT is not necessary if the data are in order.  The BY statement in the data step requires sorted data, but it does require you to run PROC SORT if the data are already in order.  And the order of the last two statements is important.  If you switch them, then you will also delete the observation having outcome=1.

But to reiterate the most important point ... in the long run, it is vital to understand the BY statement in the DATA step!

Good luck.

View solution in original post

7 REPLIES 7
Linlin
Lapis Lazuli | Level 10

do you want to try the code below?

data long;

input id visit time group outcome;

cards;

1 1 20 0 0

1 2 30 1 1

1 3 40 0 0

2 1 10 0 0

2 2 15 0 0

2 3 20 1 0

;

data want;

  set long;

  by id;

  if outcome<=lag(outcome) and lag(outcome)>0 then delete;

run;

proc print;run;

                obs    id    visit    time    group    outcome

                   1      1      1       20       0         0

                   2      1      2       30       1         1

                   3      2      1       10       0         0

                   4      2      2       15       0         0

                   5      2      3       20       1         0

Ksharp
Super User

It looks like you are transposing data from long to wide.

data long;
input id visit time group outcome;
cards;
1 1 20 0 0
1 2 30 1 1
1 3 40 0 0
2 1 10 0 0
2 2 15 0 0
2 3 20 1 0
;
run;
proc sql noprint;
 select max(count) into : count
  from (select count(*) as count from long group by id);
quit;
 data want(keep=id _:);
  array _visit{&count} ;
  array _time{&count} ;
  array _group{&count} ;
  array _outcome{&count} ;
  count=0;
  do until(last.id);
   set long;
   by id;
   count+1;
   _visit{count}=visit;_time{count}=time;
   _group{count}=group;_outcome{count}=outcome;
  end;
 run;




Ksharp

Astounding
PROC Star

Manuel,

SAS has some standard tools, that I highly recommend you learn about.  They begin with understanding the BY statement within a DATA step.  Here's an example (untested code):

proc sort data=long;

   by id visit;

run;

data want;

   set long;

   by id visit;

   if first.id then delete_the_rest='N';

   retain delete_the_rest;

   if delete_the_rest='Y' then delete;

   if outcome=1 then delete_the_rest='Y';

run;

Just a few notes ...  The PROC SORT is not necessary if the data are in order.  The BY statement in the data step requires sorted data, but it does require you to run PROC SORT if the data are already in order.  And the order of the last two statements is important.  If you switch them, then you will also delete the observation having outcome=1.

But to reiterate the most important point ... in the long run, it is vital to understand the BY statement in the DATA step!

Good luck.

Linlin
Lapis Lazuli | Level 10

Hi Astounding,

I tested your code for you. It works great!  Thank you for sharing!  - Linlin

data long;

input id visit time group outcome;

cards;

1 1 20 0 0

1 2 30 1 1

1 3 40 0 0

2 1 10 0 0

2 2 15 0 0

2 3 20 1 0

;

proc sort data=long;

   by id visit;

run;

data want;

   set long;

   by id visit;

   if first.id then delete_the_rest='N';

   retain delete_the_rest;

   if delete_the_rest='Y' then delete;

   if outcome=1 then delete_the_rest='Y';

run;

proc print;run;

                    Obs    id    visit    time    group    outcome    delete_the_rest

                     1      1      1       20       0         0          N

                     2      1      2       30       1         1          Y

                     3      2      1       10       0         0          N

                     4      2      2       15       0         0          N

                     5      2      3       20       1         0          N

Iron_grief
Calcite | Level 5

First off, thank you all.

@Linlin: I didn't know of the LAG function, I guess this is one of the functions I need to properly work with groups of observations referring to the same subject in long format, together with the BY statement.

@Astounding: your code works for me, thanks. I just don't get why SAS deletes the "Y"s since the DELETE statement comes before the "Y" definition: I mean, when SAS reads the DELETE statement no "Y" exists yet, right? I am just trying to understand the mechanics of your code, since it works great.

Astounding
PROC Star

Iron_grief,

There are two keys to understanding when "Y" exists.  First is that the DATA step processes each observation sequentially.  The statements in the DATA step execute in order for every observation individually.  (The exception in this program is when the DELETE statement executes.  In that case, the final statement will not execute for that observation.)  The second key is the RETAIN statement.  That's what tells SAS, "As you move on to the next observation, don't change DELETE_THE_REST.  Just let the current value sit there as the next observation gets processed."  All good stuff to know in the long run.

Iron_grief
Calcite | Level 5

Astounding, I agree with you that it is good to know, that's why I asked for insights.

Thank you again.

Manuel

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1645 views
  • 3 likes
  • 4 in conversation