Help using Base SAS procedures

By Group -- Replace MISSING VALUES ALONG CONSECUTIVE DATES - (Repost)

Accepted Solution Solved
Reply
Contributor
Posts: 68
Accepted Solution

By Group -- Replace MISSING VALUES ALONG CONSECUTIVE DATES - (Repost)

Dear all,

In a by group (id1,id2, TRADE_QTR - grouping vars)  I have  missing values in variable STATUS  for  various Trade_Quarters.

I would like to replace the STATUS missing values based on the following:

-- If STATUS = 'C' (=cancelled) then if any  SUBSEQUENT TRADE_QUARTERS take STATUS ='C'

-- If STATUS = 'C' (=cancelled) then if any PRIOR  TRADE_QUARTERS take STATUS = 'V'  (=Valid)

-- If there is no STATUS = 'C' then ALL values of STATUS become = 'V'

Pls. see data HAVE and WANT in the attached file.


Thanking you in advance for any hint.


Kind regards


Nikos


DATA    HAVEDATA WANT
STATUSID1ID2TRADE_QTRSTATUS_QTRPRICESTATUSID1ID2TRADE_QTRSTATUS_QTRPRICE
FG52762009Q1.0VFG52762009Q1.0
FG52762009Q2.0VFG52762009Q2.0
VFG52762009Q32009Q30VFG52762009Q32009Q30
CFG52762009Q42009Q418508CFG52762009Q42009Q418508
FG52762010Q1.0CFG52762010Q1.0
FG52762010Q2.0CFG52762010Q2.0
FG52762010Q3.0CFG52762010Q3.0
FG52762010Q4.0CFG52762010Q4.0
FG52762011Q1.0CFG52762011Q1.0
FG52762011Q2.0CFG52762011Q2.0
FG52762011Q3.0CFG52762011Q3.0
FG52762011Q4.0CFG52762011Q4.0
FG52762012Q1.0CFG52762012Q1.0
FG52762012Q2.0CFG52762012Q2.0
FG52762012Q3.0CFG52762012Q3.0
FG52762012Q4.0CFG52762012Q4.0
FG52762013Q1.0CFG52762013Q1.0
FG52762013Q2.0CFG52762013Q2.0
FG52762013Q3.0CFG52762013Q3.0
FG52762013Q4.0CFG52762013Q4.0
FG52762014Q1.0CFG52762014Q1.0
FG52762014Q2.0CFG52762014Q2.0
KL46361862009Q1.0VKL4632009Q1.0
KL46361862009Q2.0VKL4632009Q2.0
VKL46361862009Q32009Q30VKL4632009Q32009Q30
VKL46361862009Q42009Q40VKL4632009Q42009Q40
VKL46361862010Q12010Q10VKL4632010Q12010Q10
VKL46361862010Q22010Q20VKL4632010Q22010Q20
VKL46361862010Q32010Q30VKL4632010Q32010Q30
VKL46361862010Q42010Q40VKL4632010Q42010Q40
VKL46361862011Q12011Q10VKL4632011Q12011Q10
CKL46361862011Q22011Q20CKL4632011Q22011Q20
KL46361862011Q3.0CKL4632011Q3.0
KL46361862011Q4.0CKL4632011Q4.0
KL46361862012Q1.0CKL4632012Q1.0
KL46361862012Q2.0CKL4632012Q2.0
KL46361862012Q3.0CKL4632012Q3.0
KL46361862012Q4.0CKL4632012Q4.0
KL46361862013Q1.0CKL4632013Q1.0
KL46361862013Q2.0CKL4632013Q2.0
KL46361862013Q3.0CKL4632013Q3.0
KL46361862013Q4.0CKL4632013Q4.0
KL46361862014Q1.0CKL4632014Q1.0
KL46361862014Q2.0CKL4632014Q2.0
JK46366502009Q1.0VJK6502009Q1.0
JK46366502009Q2.0VJK6502009Q2.0
CJK46366502009Q32009Q31637CJK6502009Q32009Q31637
JK46366502009Q4.0CJK6502009Q4.0
JK46366502010Q1.0CJK6502010Q1.0
JK46366502010Q2.0CJK6502010Q2.0
JK46366502010Q3.0CJK6502010Q3.0
JK46366502010Q4.0CJK6502010Q4.0
JK46366502011Q1.0CJK6502011Q1.0
JK46366502011Q2.0CJK6502011Q2.0
JK46366502011Q3.0CJK6502011Q3.0
JK46366502011Q4.0CJK6502011Q4.0
JK46366502012Q1.0CJK6502012Q1.0
JK46366502012Q2.0CJK6502012Q2.0
JK46366502012Q3.0CJK6502012Q3.0
JK46366502012Q4.0CJK6502012Q4.0
JK46366502013Q1.0CJK6502013Q1.0
JK46366502013Q2.0CJK6502013Q2.0
JK46366502013Q3.0CJK6502013Q3.0
JK46366502013Q4.0CJK6502013Q4.0
JK46366502014Q1.0CJK6502014Q1.0
JK46366502014Q2.0CJK6502014Q2.0
VMN3787942009Q12009Q124394VMN7942009Q12009Q124394
MN3787942009Q2.0VMN7942009Q2.0
MN3787942009Q3.0VMN7942009Q3.0
MN3787942009Q4.0VMN7942009Q4.0
MN3787942010Q1.0VMN7942010Q1.0
CMN3787942010Q22010Q2-6296CMN7942010Q22010Q2-6296
MN3787942010Q3.0CMN7942010Q3.0
MN3787942010Q4.0CMN7942010Q4.0
MN3787942011Q1.0CMN7942011Q1.0
MN3787942011Q2.0CMN7942011Q2.0
MN3787942011Q3.0CMN7942011Q3.0
MN3787942011Q4.0CMN7942011Q4.0
MN3787942012Q1.0CMN7942012Q1.0
MN3787942012Q2.0CMN7942012Q2.0
MN3787942012Q3.0CMN7942012Q3.0
MN3787942012Q4.0CMN7942012Q4.0
MN3787942013Q1.0CMN7942013Q1.0
MN3787942013Q2.0CMN7942013Q2.0
MN3787942013Q3.0CMN7942013Q3.0
MN3787942013Q4.0CMN7942013Q4.0
MN3787942014Q1.0CMN7942014Q1.0
MN3787942014Q2.0CMN7942014Q2.0
VBV3787942009Q12009Q10VBV58792009Q12009Q10
VBV3787942009Q22009Q24469VBV58792009Q22009Q24469
VBV3787942009Q32009Q36800VBV58792009Q32009Q36800
BV3787942009Q4.0VBV58792009Q4.0
BV3787942010Q1.0VBV58792010Q1.0
BV3787942010Q2.0VBV58792010Q2.0
BV3787942010Q3.0VBV58792010Q3.0
BV3787942010Q4.0VBV58792010Q4.0
BV3787942011Q1.0VBV58792011Q1.0
BV3787942011Q2.0VBV58792011Q2.0
BV3787942011Q3.0VBV58792011Q3.0
BV3787942011Q4.0VBV58792011Q4.0
BV3787942012Q1.0VBV58792012Q1.0
BV3787942012Q2.0VBV58792012Q2.0
BV3787942012Q3.0VBV58792012Q3.0
BV3787942012Q4.0VBV58792012Q4.0
BV3787942013Q1.0VBV58792013Q1.0
BV3787942013Q2.0VBV58792013Q2.0
BV3787942013Q3.0VBV58792013Q3.0
BV3787942013Q4.0VBV58792013Q4.0
BV3787942014Q1.0VBV58792014Q1.0
CBV3787942014Q22014Q2-4874CBV58792014Q22014Q2-4874
BV58792010Q4.0OBV58792010Q4.0
BV58792011Q1.0OBV58792011Q1.0
BV58792011Q2.0OBV58792011Q2.0
VBV58792011Q32011Q3980OBV58792011Q32011Q3980
BV58792011Q4.0OBV58792011Q4.0
BV58792012Q1.0OBV58792012Q1.0
BV58792012Q2.0OBV58792012Q2.0
BV58792012Q3.0OBV58792012Q3.0
BV58792012Q4.0OBV58792012Q4.0
BV58792013Q1.540OBV58792013Q1.540
BV58792013Q2.0OBV58792013Q2.0
BV58792013Q3.0OBV58792013Q3.0
BV58792013Q4.0OBV58792013Q4.0
BV58792014Q1.0OBV58792014Q1.0
BV58792014Q2.0OBV58792014Q2.0



Accepted Solutions
Solution
‎09-29-2014 08:25 AM
Super User
Posts: 10,020

Re: By Group -- Replace MISSING VALUES ALONG CONSECUTIVE DATES - (Repost)

Assuming there are at most one C in each id1 and id2 group .

data have;
input (STATUS     ID1     ID2     TRADE_QTR ) ($);
cards;
. FG     5276     2009Q1     
. FG     5276     2009Q2     
V     FG     5276     2009Q3     
C     FG     5276     2009Q4     
. FG     5276     2010Q1     
. FG     5276     2010Q2     
. FG     5276     2010Q3     
. FG     5276     2010Q4
. FG     5276     2011Q1     
. FG     5276     2011Q2     
. FG     5276     2011Q3     
. FG     5276     2011Q4     
. FG     5276     2012Q1     
. FG     5276     2012Q2     
. FG     5276     2012Q3     
. FG     5276     2012Q4     
. FG     5276     2013Q1     
. FG     5276     2013Q2     
. FG     5276     2013Q3     
. FG     5276     2013Q4     
. FG     5276     2014Q1     
. FG     5276     2014Q2     
. KL     4636186     2009Q1     
. KL     4636186     2009Q2     
;
run;
data want;
 set have;
 by  ID1     ID2;
 length _status $ 1;
 retain  _status;
 if first.id2 then _status='V' ;
 if status='C' then _status='C';
run;



Xia Keshan

View solution in original post


All Replies
SAS Employee
Posts: 340

Re: By Group -- Replace MISSING VALUES ALONG CONSECUTIVE DATES - (Repost)

Hi Nikos,

I don't give you a detailed solution (because I'm not sure I understood correctly the rules), but I think you can use the technique below to solve this problem.

In your previous post you didn't indicate what is wrong with the provided solutions. You can freely make comments on anyone's solution.

The technique is called double DOW loop, and you can use it in cases, where first you need to pass through one group, calculate something, than use that calculated something in a second pass.

In this case I'm just using it, because I need to look ahead until the next non-missing STATUS or until the end of the group.

I think TRADE_QTR is not a grouping variable, but your dataset is (and should be) grouped by id1 and id2 and sorted by TRADE_QTR.

I am using id1, id2 and STATUS in the by statement, but this is only a trick: I can easily identify consecutive C, consecutive V or consecutive missing STATUSes with it.

Code not tested yet. Tested.

data want;

do until(last.id2 or (last.STATUS and STATUS ne ''));

     set have;

     by id1 id2 STATUS notsorted;

     if first.id2 then do;

          STATUS_prev='';

     end;

end;

/*probably this should be changed, if your logic is different than I think:*/

if STATUS ~= '' and STATUS_prev = '' then STATUS_group='V';

else if STATUS_prev ~='' then STATUS_group=STATUS_prev;

else STATUS_group='V';

do until(last.id2 or (last.STATUS and STATUS ne ''));

     set have;

     by id1 id2 STATUS notsorted;

     if STATUS='' then do;

          STATUS=STATUS_group;

     end;

     output;

end;

retain STATUS_prev;

STATUS_prev=STATUS;

run;

An SQL based solution could also work, but it runs slowly on such problems.

Super Contributor
Posts: 490

Re: By Group -- Replace MISSING VALUES ALONG CONSECUTIVE DATES - (Repost)

Solution ---- (Repost):

data want (drop=newV );

retain newV ' ';

set have;

if _n_=1 and STATUS = ' ' then Do ;

STATUS = 'V';

newV=STATUS;

End;

Else if STATUS in ('C','V') then newV=STATUS;

Else if STATUS=' ' then STATUS=newV;

run;

Could you run this code on the data you have ? And let us know how it is difference from what you want. keeping in mind that the 'O' values for the STATUS is typo as you clarified in your previous post.

Contributor
Posts: 68

Re: By Group -- Replace MISSING VALUES ALONG CONSECUTIVE DATES - (Repost)

Posted in reply to mohamed_zaki

Dear Mohamed,

Unfortunately it gives C across all observations plus it does not take into account the by-grouping

Thank you

Best regards

Super Contributor
Posts: 490

Re: By Group -- Replace MISSING VALUES ALONG CONSECUTIVE DATES - (Repost)

I am running it in your sample data and it give me the desired output.

Without giving C across all observation. Even your data sample in the post is not starting with C and there is no way the code will give C for all the observation except for case when your data starting with C and not having V in any observation! the code is simple and you can trace it easily?

If you could share your entire code or upload the file you are reading from the data. in order to help you further.

Contributor
Posts: 68

Re: By Group -- Replace MISSING VALUES ALONG CONSECUTIVE DATES - (Repost)

Posted in reply to mohamed_zaki

Please see my reply to RW9

Thank you

Super Contributor
Posts: 490

Re: By Group -- Replace MISSING VALUES ALONG CONSECUTIVE DATES - (Repost)

data want (drop=newV );

set have;

by id1 id2;

if (first.id2 ) then do;

  retain newV ' ';

  if STATUS = ' ' then Do ;

  STATUS = 'V';

  newV=STATUS;

  End;

  Else if STATUS in ('C','V') then newV=STATUS;

End;

Else  do;

  If STATUS in ('C','V') then newV=STATUS;

  Else if STATUS=' ' then STATUS=newV;

end;

run;

Super User
Super User
Posts: 7,942

Re: By Group -- Replace MISSING VALUES ALONG CONSECUTIVE DATES - (Repost)

Hi,

Well, have only tried this on a small subset, and I think you are missing something in you logic.  What if status=C and no other occurrence of C?  Then its all missing.  Please clarify your logic.  This should be near to the OP (although I am not typing all that test data in):

data have;
  infile datalines dsd missover dlm=",";
  input status $ id1 $ id2 $ trade_qtr $ status_qtr $ price;
datalines;
,FG,5276,2009Q1,.,0,,
,FG,5276,2009Q2,.,0,,
V,FG,5276,2009Q3,2009Q3,0,
C,FG,5276,2009Q4,2009Q4,18508,
,FG,5276,2010Q1,.,0,,
,FG,5276,2010Q2,.,0,,
,FG,5276,2010Q3,.,0,,
,FG,5276,2010Q4,.,0,,
,FG,5276,2011Q1,.,0,,,
,FG,5276,2011Q2,.,0,,
,FG,5276,2011Q3,.,0,,
,FG,5276,2011Q4,.,
;
run;

data inter;
  set have;
  ind=input(compress(trade_qtr," Q"),best.);
run;

proc sql;
  create table WANT as
  select  A.*,
          case  when A.STATUS="C" and exists(select THIS.TRADE_QTR from WORK.INTER THIS where THIS.ID1=A.ID1 and THIS.ID2=A.ID2 and THIS.IND > A.IND and THIS.STATUS="C") then "C"
                when A.STATUS="C" and exists(select THIS.TRADE_QTR from WORK.INTER THIS where THIS.ID1=A.ID1 and THIS.ID2=A.ID2 and THIS.IND < A.IND and THIS.STATUS="C") then "V"
                when A.STATUS="" and not exists(select THIS.TRADE_QTR from WORK.INTER THIS where THIS.ID1=A.ID1 and THIS.ID2=A.ID2 and THIS.STATUS="C") then "V"
                end as NEW_STATUS
  from    WORK.INTER A;
quit;

Contributor
Posts: 68

Re: By Group -- Replace MISSING VALUES ALONG CONSECUTIVE DATES - (Repost)

Hi,

If STATUS='C" then prior observations get the value "V"  and

forward observations get the value "C"

Thank you.

Contributor
Posts: 68

Re: By Group -- Replace MISSING VALUES ALONG CONSECUTIVE DATES - (Repost)

This is my Data Have and Want

It can only be one "C" per by group.

If there is "C" any prior obs become "V" and any subsequent remain "C"

If there is no "C" in a by group all obs should be "V" 

Thank you

Data HAVEData WANT
Statusid1ID2status_qtrpriceStatusid1ID2status_qtrprice
V83610122010Q20V83610122010Q20
C83610122010Q3200C83610122010Q3200
83610122010Q40C83610122010Q40
83610122011Q10C83610122011Q10
83610122011Q20C83610122011Q20
83610122011Q30C83610122011Q30
83610122011Q40C83610122011Q40
83610122012Q10C83610122012Q10
83610122012Q20C83610122012Q20
83610122012Q30C83610122012Q30
83610122012Q40C83610122012Q40
83610122013Q10C83610122013Q10
83610122013Q20C83610122013Q20
83610122013Q30C83610122013Q30
83610122013Q40C83610122013Q40
83610122014Q10C83610122014Q10
83610122014Q20C83610122014Q20
V111070122009Q12845V111070122009Q12845
111070122009Q20V111070122009Q20
C111070122009Q38000C111070122009Q38000
111070122009Q40C111070122009Q40
111070122010Q10C111070122010Q10
111070122010Q20C111070122010Q20
111070122010Q30C111070122010Q30
111070122010Q40C111070122010Q40
111070122011Q10C111070122011Q10
111070122011Q20C111070122011Q20
111070122011Q30C111070122011Q30
111070122011Q40C111070122011Q40
111070122012Q10C111070122012Q10
111070122012Q20C111070122012Q20
111070122012Q30C111070122012Q30
111070122012Q40C111070122012Q40
111070122013Q10C111070122013Q10
111070122013Q20C111070122013Q20
111070122013Q30C111070122013Q30
111070122013Q40C111070122013Q40
111070122014Q10C111070122014Q10
111070122014Q20C111070122014Q20
18703122010Q29000V18703122010Q29000
18703122010Q30V18703122010Q30
18703122010Q40V18703122010Q40
18703122011Q10V18703122011Q10
18703122011Q20V18703122011Q20
18703122011Q30V18703122011Q30
C18703122011Q40C18703122011Q40
18703122012Q10C18703122012Q10
18703122012Q20C18703122012Q20
18703122012Q30C18703122012Q30
18703122012Q40C18703122012Q40
18703122013Q10C18703122013Q10
18703122013Q20C18703122013Q20
18703122013Q30C18703122013Q30
18703122013Q40C18703122013Q40
18703122014Q10C18703122014Q10
18703122014Q20C18703122014Q20
V14188772010Q30V14188772010Q30
14188772010Q4200V14188772010Q4200
14188772011Q13091V14188772011Q13091
14188772011Q20V14188772011Q20
14188772011Q30V14188772011Q30
14188772011Q40V14188772011Q40
V14188772012Q10V14188772012Q10
14188772012Q20V14188772012Q20
14188772012Q30V14188772012Q30
14188772012Q40V14188772012Q40
C14188772013Q10C14188772013Q10
14188772013Q20C14188772013Q20
14188772013Q30C14188772013Q30
14188772013Q40C14188772013Q40
14188772014Q10C14188772014Q10
14188772014Q20C14188772014Q20
C14188792010Q4200C14188792010Q4200
14188792011Q10C14188792011Q10
14188792011Q20C14188792011Q20
14188792011Q30C14188792011Q30
14188792011Q40C14188792011Q40
14188792012Q10C14188792012Q10
14188792012Q20C14188792012Q20
14188792012Q30C14188792012Q30
14188792012Q40C14188792012Q40
14188792013Q10C14188792013Q10
14188792013Q20C14188792013Q20
14188792013Q30C14188792013Q30
14188792013Q40C14188792013Q40
14188792014Q10C14188792014Q10
14188792014Q20C14188792014Q20
V789112009Q10V789112009Q10
789112009Q20V789112009Q20
789112009Q30V789112009Q30
789112009Q40V789112009Q40
V789112010Q10V789112010Q10
V789112010Q20V789112010Q20
C789112010Q30C789112010Q30
789112010Q40C789112010Q40
789112011Q10C789112011Q10
789112011Q20C789112011Q20
789112011Q30C789112011Q30
789112011Q40C789112011Q40
789112012Q10C789112012Q10
789112012Q20C789112012Q20
789112012Q30C789112012Q30
789112012Q40C789112012Q40
789112013Q10C789112013Q10
789112013Q20C789112013Q20
789112013Q30C789112013Q30
789112013Q40C789112013Q40
789112014Q10C789112014Q10
789112014Q20C789112014Q20
428222222009Q110314V428222222009Q110314
428222222009Q20V428222222009Q20
428222222009Q30V428222222009Q30
V428222222009Q40V428222222009Q40
428222222010Q10V428222222010Q10
428222222010Q20V428222222010Q20
428222222010Q30V428222222010Q30
428222222010Q4555V428222222010Q4555
428222222011Q10V428222222011Q10
428222222011Q20V428222222011Q20
428222222011Q30V428222222011Q30
428222222011Q40V428222222011Q40
428222222012Q10V428222222012Q10
428222222012Q20V428222222012Q20
428222222012Q30V428222222012Q30
428222222012Q40V428222222012Q40
428222222013Q10V428222222013Q10
428222222013Q20V428222222013Q20
428222222013Q30V428222222013Q30
428222222013Q40V428222222013Q40
428222222014Q10V428222222014Q10
428222222014Q20V428222222014Q20
8252232011Q42558C8252232011Q42558
8252232012Q10C8252232012Q10
8252232012Q20C8252232012Q20
8252232012Q30C8252232012Q30
8252232012Q40C8252232012Q40
8252232013Q10C8252232013Q10
C8252232013Q20C8252232013Q20
8252232013Q30C8252232013Q30
8252232013Q40C8252232013Q40
8252232014Q10C8252232014Q10
8252232014Q20C8252232014Q20
Trusted Advisor
Posts: 3,212

Re: By Group -- Replace MISSING VALUES ALONG CONSECUTIVE DATES - (Repost)

Nikos, how big is your data?      ( Say several millions of records some of several Gb's)

When it is really big work on an ordered dataset (reducing io)

When it is rather small, who cares you can probably doe something with a cartesian  product with SQL 

---->-- ja karman --<-----
Contributor
Posts: 68

Re: By Group -- Replace MISSING VALUES ALONG CONSECUTIVE DATES - (Repost)

Hi Jaap,

My data is about 2 million records

Thank you.

Best regards

SAS Employee
Posts: 340

Re: By Group -- Replace MISSING VALUES ALONG CONSECUTIVE DATES - (Repost)

Have you tried my program? If data is already sorted, it needs only 1 (physical) pass through the data.

On the other hand it makes 2 logical passes. None of the solutions will work, that make only one logical pass through the data, because if a value is missing you should read ahead until the end of the group or until a nonmissing value.

Message was edited by: Gergely Bathó

Trusted Advisor
Posts: 3,212

Re: By Group -- Replace MISSING VALUES ALONG CONSECUTIVE DATES - (Repost)

too little information for some choice for a technical solution. Missing is the dataset-sizing, how many Gb's.

As how big is your machine you are running on.  (memory cpu's dasd).

The options are thinking in the logic.

- Getting that record with C isolated and remerged back applying a condition  (That is what SQL will let you do)

- Working in the ordered dataset using IO-access the condition to be found by moving up an other PDW counter (double access)

- Using has tables where the condition key-value is stored in

---->-- ja karman --<-----
Solution
‎09-29-2014 08:25 AM
Super User
Posts: 10,020

Re: By Group -- Replace MISSING VALUES ALONG CONSECUTIVE DATES - (Repost)

Assuming there are at most one C in each id1 and id2 group .

data have;
input (STATUS     ID1     ID2     TRADE_QTR ) ($);
cards;
. FG     5276     2009Q1     
. FG     5276     2009Q2     
V     FG     5276     2009Q3     
C     FG     5276     2009Q4     
. FG     5276     2010Q1     
. FG     5276     2010Q2     
. FG     5276     2010Q3     
. FG     5276     2010Q4
. FG     5276     2011Q1     
. FG     5276     2011Q2     
. FG     5276     2011Q3     
. FG     5276     2011Q4     
. FG     5276     2012Q1     
. FG     5276     2012Q2     
. FG     5276     2012Q3     
. FG     5276     2012Q4     
. FG     5276     2013Q1     
. FG     5276     2013Q2     
. FG     5276     2013Q3     
. FG     5276     2013Q4     
. FG     5276     2014Q1     
. FG     5276     2014Q2     
. KL     4636186     2009Q1     
. KL     4636186     2009Q2     
;
run;
data want;
 set have;
 by  ID1     ID2;
 length _status $ 1;
 retain  _status;
 if first.id2 then _status='V' ;
 if status='C' then _status='C';
run;



Xia Keshan

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 353 views
  • 0 likes
  • 6 in conversation