Help using Base SAS procedures

Help

Accepted Solution Solved
Reply
Regular Contributor
Posts: 168
Accepted Solution

Help

Hi All,

this is my part of dataset

 

SUBJID             PSADN            days       pcfn

70007             08DEC2008       -8              .

70007           15DEC2008         -1              .

70007            05JAN2009        21           1.2

70007            26JAN2009       42           -16.7

70007             23FEB2009     70             -10.6

70007           16MAR2009      91              47.1

70007            06APR2009      112           32.0

70007             28APR2009     134            8.2

70007            18MAY2009      154           46.1

70007            08JUN2009       175           39.6

70007          29JUN2009         196            55.0

70007           20JUL2009         217           51.8

70007          08SEP2009         267          149.9

70007           28SEP2009        287 1          114.5

i have to pick only those records which having pcfn>25. if pcfn>25 , in next records also it is greater than 25 . In the above example 16MAR2009  pcfn value is 47.1 but later the subject at 18may2009 he had <25 . So i dont want to consider those 2 records also. My final o/p be like

SUBJID             PSADN            days       pcfn        newdate                   flag

70007            18MAY2009      154           46.1       18MAY2009              1

70007            08JUN2009       175           39.6       18MAY2009              0

70007          29JUN2009         196            55.0       18MAY2009            0

70007           20JUL2009         217           51.8        18MAY2009           0

70007          08SEP2009         267          149.9       18MAY2009          0

70007           28SEP2009        287 1          114.5     18MAY2009         0

Thanks in advance!!!!!!!!


Accepted Solutions
Solution
‎09-06-2012 03:34 PM
Valued Guide
Posts: 765

Re: Help

hi ... one idea ...

data x;

input subjid :$5. psadn :date9.  days  pcfn;

datalines;

70007    08DEC2008     -8      .

70007    15DEC2008     -1      .

70007    05JAN2009     21     1.2

70007    26JAN2009     42   -16.7

70007    23FEB2009     70   -10.6

70007    16MAR2009     91    47.1

70007    06APR2009    112    32.0

70007    28APR2009    134     8.2

70007    18MAY2009    154    46.1

70007    08JUN2009    175    39.6

70007    29JUN2009    196    55.0

70007    20JUL2009    217    51.8

70007    08SEP2009    267   149.9

70007    28SEP2009    287   114.5

;

proc sort data=x;

by subjid days;

run;

data y;

do until (last.subjid);

   set x (in=one) x;

   by subjid;

   if one and pcfn le 25 then call missing(newdate);

   else

   newdate = ifn(one and missing(newdate), psadn , newdate);

   flag = psadn eq newdate;

   if ^one and psadn ge newdate then output;

end;

format newdate date9.;

run;

subjid        psadn    days     pcfn      newdate    flag

70007     18MAY2009     154     46.1    18MAY2009      1

70007     08JUN2009     175     39.6    18MAY2009      0

70007     29JUN2009     196     55.0    18MAY2009      0

70007     20JUL2009     217     51.8    18MAY2009      0

70007     08SEP2009     267    149.9    18MAY2009      0

70007     28SEP2009     287    114.5    18MAY2009      0

View solution in original post


All Replies
Solution
‎09-06-2012 03:34 PM
Valued Guide
Posts: 765

Re: Help

hi ... one idea ...

data x;

input subjid :$5. psadn :date9.  days  pcfn;

datalines;

70007    08DEC2008     -8      .

70007    15DEC2008     -1      .

70007    05JAN2009     21     1.2

70007    26JAN2009     42   -16.7

70007    23FEB2009     70   -10.6

70007    16MAR2009     91    47.1

70007    06APR2009    112    32.0

70007    28APR2009    134     8.2

70007    18MAY2009    154    46.1

70007    08JUN2009    175    39.6

70007    29JUN2009    196    55.0

70007    20JUL2009    217    51.8

70007    08SEP2009    267   149.9

70007    28SEP2009    287   114.5

;

proc sort data=x;

by subjid days;

run;

data y;

do until (last.subjid);

   set x (in=one) x;

   by subjid;

   if one and pcfn le 25 then call missing(newdate);

   else

   newdate = ifn(one and missing(newdate), psadn , newdate);

   flag = psadn eq newdate;

   if ^one and psadn ge newdate then output;

end;

format newdate date9.;

run;

subjid        psadn    days     pcfn      newdate    flag

70007     18MAY2009     154     46.1    18MAY2009      1

70007     08JUN2009     175     39.6    18MAY2009      0

70007     29JUN2009     196     55.0    18MAY2009      0

70007     20JUL2009     217     51.8    18MAY2009      0

70007     08SEP2009     267    149.9    18MAY2009      0

70007     28SEP2009     287    114.5    18MAY2009      0

Super Contributor
Posts: 1,636

Re: Help

one way:

data have;

infile cards missover;

input SUBJID             PSADN $           days       pcfn;

cards;

70007             08DEC2008       -8              .

70007           15DEC2008         -1              .

70007            05JAN2009        21           1.2

70007            26JAN2009       42           -16.7

70007             23FEB2009     70             -10.6

70007           16MAR2009      91              47.1

70007            06APR2009      112           32.0

70007             28APR2009     134            8.2

70007            18MAY2009      154           46.1

70007            08JUN2009       175           39.6

70007          29JUN2009         196            55.0

70007           20JUL2009         217           51.8

70007          08SEP2009         267          149.9

70007           28SEP2009        287         114.5

;

data temp;

  set have;

  by subjid;

  if first.subjid then group+1;

  if not first.subjid and pcfn>25 and lag(pcfn)<=25 then group+1;

  run;

  proc sql;

    create table want as

  select * from temp

    where group in (

   select group from temp

     group by group

     having min(pcfn)>25);

  quit;

proc print;run;

Regular Contributor
Posts: 168

Re: Help

Thanks to you both!!!!!!!

Hi LinLin your apporach is good

Hi mike,

I got what i want but i have small question
what does it mean "Interleaving a Dataset with Itself' ?

could you explain little bit

PROC Star
Posts: 7,492

Re: Help

Did you read the article that Mike mentioned in your previous thread? i.e.:

ps  "Interleaving a Dataset with Itself: How and Why" by Howard Schreier

http://www.nesug.org/proceedings/nesug03/cc/cc002.pdf


It does a nice job of explaining the process.


Valued Guide
Posts: 765

Re: Help

Hi ... as Art said, Howard's article is great.

Quickly, "official" definition of INTERLEAVE ...

"Interleaving uses a SET statement and a BY statement to combine multiple data sets into one new data set. The number of observations in the new data set is the sum of the number of observations from the original data sets. However, the observations in the new data set are arranged by the values of the BY variable or variables and, within each BY group, by the order of the data sets in which they occur. You can interleave data sets either by using a BY variable or by using an index."
So, if you specify the same data set name twice ...
set x (in=one) x;
by subjid;
the data step makes two passes through your data for each unique value of SUBJID
that allows you to make decisions about the content of your data during the first pass (in your data,  look for the last observation within a SUBJID where all subsequent values of PCFN are > 25) and in the second pass, you apply the rule formulated in the first pass (in your data, output only those observations with a date => the date found in the first pass)

Howard explains all this in his paper (it was the first place I saw anyone interleave a data set with itself ... among the many things I have learned from Howard's paper, talks, and postings).

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 221 views
  • 3 likes
  • 4 in conversation