DATA Step, Macro, Functions and more

Arrays in long dataset?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 15
Accepted Solution

Arrays in long dataset?

Greetings Community,

I would like help with the following issue: is there a way to create arrays when working with datasets in long format?

I usually work with a wide dataset with the following configuration:

data wide;

input id time1 time2 time3 group1 group2 group3 outcome1 outcome2 outcome3;

cards;

1 20 30 40 0 1 0 0 1 0

2 10 15 20 0 0 1 0 0 0

;

On this dataset, I perform time-dependent survival analysis and I am able to censor cases which had developed in previous time periods (i.e., in time period 3, subject 1 would be excluded since he developed the outcome in time period 2). I manage to do this with arrays.

Now, I am working with an Access-derived dataset, which is formatted in long fashion like this:

data long;

input id visit time group outcome;

cards;

1 1 20 0 0

1 2 30 1 1

1 3 40 0 0

2 1 10 0 0

2 2 15 0 0

2 3 20 1 0

;

How would you group the set of outcomes from the same patient, in example to exclude the third observation (and any other after an outcome has occurres, for instance)?

Many thanks,

Manuel


Accepted Solutions
Solution
‎03-27-2012 09:28 AM
Super User
Posts: 5,511

Re: Arrays in long dataset?

Posted in reply to Iron_grief

Manuel,

SAS has some standard tools, that I highly recommend you learn about.  They begin with understanding the BY statement within a DATA step.  Here's an example (untested code):

proc sort data=long;

   by id visit;

run;

data want;

   set long;

   by id visit;

   if first.id then delete_the_rest='N';

   retain delete_the_rest;

   if delete_the_rest='Y' then delete;

   if outcome=1 then delete_the_rest='Y';

run;

Just a few notes ...  The PROC SORT is not necessary if the data are in order.  The BY statement in the data step requires sorted data, but it does require you to run PROC SORT if the data are already in order.  And the order of the last two statements is important.  If you switch them, then you will also delete the observation having outcome=1.

But to reiterate the most important point ... in the long run, it is vital to understand the BY statement in the DATA step!

Good luck.

View solution in original post


All Replies
Super Contributor
Posts: 1,636

Arrays in long dataset?

Posted in reply to Iron_grief

do you want to try the code below?

data long;

input id visit time group outcome;

cards;

1 1 20 0 0

1 2 30 1 1

1 3 40 0 0

2 1 10 0 0

2 2 15 0 0

2 3 20 1 0

;

data want;

  set long;

  by id;

  if outcome<=lag(outcome) and lag(outcome)>0 then delete;

run;

proc print;run;

                obs    id    visit    time    group    outcome

                   1      1      1       20       0         0

                   2      1      2       30       1         1

                   3      2      1       10       0         0

                   4      2      2       15       0         0

                   5      2      3       20       1         0

Super User
Posts: 10,035

Arrays in long dataset?

Posted in reply to Iron_grief

It looks like you are transposing data from long to wide.

data long;
input id visit time group outcome;
cards;
1 1 20 0 0
1 2 30 1 1
1 3 40 0 0
2 1 10 0 0
2 2 15 0 0
2 3 20 1 0
;
run;
proc sql noprint;
 select max(count) into : count
  from (select count(*) as count from long group by id);
quit;
 data want(keep=id _:);
  array _visit{&count} ;
  array _time{&count} ;
  array _group{&count} ;
  array _outcome{&count} ;
  count=0;
  do until(last.id);
   set long;
   by id;
   count+1;
   _visit{count}=visit;_time{count}=time;
   _group{count}=group;_outcome{count}=outcome;
  end;
 run;




Ksharp

Solution
‎03-27-2012 09:28 AM
Super User
Posts: 5,511

Re: Arrays in long dataset?

Posted in reply to Iron_grief

Manuel,

SAS has some standard tools, that I highly recommend you learn about.  They begin with understanding the BY statement within a DATA step.  Here's an example (untested code):

proc sort data=long;

   by id visit;

run;

data want;

   set long;

   by id visit;

   if first.id then delete_the_rest='N';

   retain delete_the_rest;

   if delete_the_rest='Y' then delete;

   if outcome=1 then delete_the_rest='Y';

run;

Just a few notes ...  The PROC SORT is not necessary if the data are in order.  The BY statement in the data step requires sorted data, but it does require you to run PROC SORT if the data are already in order.  And the order of the last two statements is important.  If you switch them, then you will also delete the observation having outcome=1.

But to reiterate the most important point ... in the long run, it is vital to understand the BY statement in the DATA step!

Good luck.

Super Contributor
Posts: 1,636

Re: Arrays in long dataset?

Posted in reply to Astounding

Hi Astounding,

I tested your code for you. It works great!  Thank you for sharing!  - Linlin

data long;

input id visit time group outcome;

cards;

1 1 20 0 0

1 2 30 1 1

1 3 40 0 0

2 1 10 0 0

2 2 15 0 0

2 3 20 1 0

;

proc sort data=long;

   by id visit;

run;

data want;

   set long;

   by id visit;

   if first.id then delete_the_rest='N';

   retain delete_the_rest;

   if delete_the_rest='Y' then delete;

   if outcome=1 then delete_the_rest='Y';

run;

proc print;run;

                    Obs    id    visit    time    group    outcome    delete_the_rest

                     1      1      1       20       0         0          N

                     2      1      2       30       1         1          Y

                     3      2      1       10       0         0          N

                     4      2      2       15       0         0          N

                     5      2      3       20       1         0          N

Occasional Contributor
Posts: 15

Re: Arrays in long dataset?

Posted in reply to Iron_grief

First off, thank you all.

@Linlin: I didn't know of the LAG function, I guess this is one of the functions I need to properly work with groups of observations referring to the same subject in long format, together with the BY statement.

@Astounding: your code works for me, thanks. I just don't get why SAS deletes the "Y"s since the DELETE statement comes before the "Y" definition: I mean, when SAS reads the DELETE statement no "Y" exists yet, right? I am just trying to understand the mechanics of your code, since it works great.

Super User
Posts: 5,511

Re: Arrays in long dataset?

Posted in reply to Iron_grief

Iron_grief,

There are two keys to understanding when "Y" exists.  First is that the DATA step processes each observation sequentially.  The statements in the DATA step execute in order for every observation individually.  (The exception in this program is when the DELETE statement executes.  In that case, the final statement will not execute for that observation.)  The second key is the RETAIN statement.  That's what tells SAS, "As you move on to the next observation, don't change DELETE_THE_REST.  Just let the current value sit there as the next observation gets processed."  All good stuff to know in the long run.

Occasional Contributor
Posts: 15

Re: Arrays in long dataset?

Posted in reply to Astounding

Astounding, I agree with you that it is good to know, that's why I asked for insights.

Thank you again.

Manuel

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 460 views
  • 3 likes
  • 4 in conversation