BookmarkSubscribeRSS Feed
clinsas
Calcite | Level 5

I want to validate my data set with certain condition where i am i am validating two rows of one fields. I want to create a data set with below listed conditions.(sample .csv attached)

 

1)if act ship has a date then act=Results should also have date(if date is present for ship and missing for Results then i want to output such records)

if act=ship and not missing(date) and act=result and missing(date);


2)if act ship has a date then then all pars of act Gp except rq should also have Value(if date is present for ship but value is missng for GP(x,xx,xxx,a,aa,aaa,c,cc,ccc,cq,freq) then i want to output such records)

if act=ship and not missing(date) and act=Gp and pars in (x,xx,xxx,a,aa,aaa,c,cc,ccc,cq,freq) and missing(values);

 

3)if act ship has a date then Pars 'case2' of act 'tot' should also have value(if date is present for ship and value is missng for case2 then i want to output such records)

if act=ship and not missing(date) and act=tot and pars=case2 and missing(Value);

 

 

 

Thanks in advance

 

3 REPLIES 3
PeterClemmensen
Tourmaline | Level 20

Hi and welcome to the SAS community 🙂

 

I have converted your csv file into a data step. This makes it much easier for users in here to provide a usable code answer for you.

 

Now, you have listed a few guidelines for your result. But if you really want to help us help you, post what your desired resuls looks like from this data set.

 

data have;
input id type DT date $ act $ pars $ Value;
infile datalines dlm=',' dsd;
datalines;
1,1,9,04JUL2016,Ship,,
1,1,9,U,Ship2,,
1,1,9,U,Ship3,,
1,1,10,U,GP,x,
1,1,10,U,GP,xx,
1,1,10,U,GP,xxx,
1,1,10,U,GP,a,
1,1,10,U,GP,aa,
1,1,10,U,GP,aaa,
1,1,10,U,GP,c,
1,1,10,U,GP,ccc,
1,1,10,U,GP,ccc,
1,1,10,U,GP,rq,
1,1,10,U,GP,cq,
1,1,10,U,GP,freq,
1,1,12,,Results,,
1,1,15,21SEP2016,tot,case1,0
1,1,15,21SEP2016,tot,case2,
2,1,9,11MAY2016,Ship,,
2,1,9,U,Ship2,,
2,1,9,U,Ship3,,
2,1,10,U,GP,x,1
2,1,10,U,GP,xx,1
2,1,10,U,GP,xxx,1
2,1,10,U,GP,a,1
2,1,10,U,GP,aa,1
2,1,10,U,GP,aaa,1
2,1,10,U,GP,c,1
2,1,10,U,GP,cc,1
2,1,10,U,GP,ccc,1
2,1,10,U,GP,rq,
2,1,10,U,GP,cq,1
2,1,10,U,GP,freq,1
2,1,12,21SEP2016,Results,,
2,1,15,21SEP2016,tot,case1,0
2,1,15,21SEP2016,tot,case2,2
;
clinsas
Calcite | Level 5

I want to validate my data set with certain condition where i am i am validating two rows of one fields. I want to create a data set with below listed conditions.(sample .csv attached)

 

1)if act ship has a date then act=Results should also have date(if date is present for ship and missing for Results then i want to output such records)

if act=ship and not missing(date) and act=result and missing(date);


2)if act ship has a date then then all pars of act Gp except rq should also have Value(if date is present for ship but value is missng for GP(x,xx,xxx,a,aa,aaa,c,cc,ccc,cq,freq) then i want to output such records)

if act=ship and not missing(date) and act=Gp and pars in (x,xx,xxx,a,aa,aaa,c,cc,ccc,cq,freq) and missing(values);

 

3)if act ship has a date then Pars 'case2' of act 'tot' should also have value(if date is present for ship and value is missng for case2 then i want to output such records)

if act=ship and not missing(date) and act=tot and pars=case2 and missing(Value);

 

 

The output should look something like this or something more meaningful.

like for id 1 act=ship has date but act=result date is missing.

Ship has date but value is missing for case2 and pars of GP.(exclude rq)

we can add a flag as identifier. 

id   type  DT     date       act                pars Value flag1
1    1       9   04-Jul-16 Ship                                   1
1    1     12                    Results                              1
1    1     10                    GP                     x              1
1    1    10                     GP                    xx             1
1    1    10                     GP                   xxx            1
1    1    10                     GP                     a             1
1    1    10                     GP                   aa             1
1    1    10                     GP                  aaa            1
1    1    10                     GP                   c               1
1    1    10                     GP                 ccc              1
1    1    10                     GP                 ccc              1
1    1   15    21-Sep-16 tot                 case2           1

 

 

sample dataset below.

data have;
input id type DT date $ act $ pars $ Value;
infile datalines dlm=',' dsd;
datalines;1,1,9,04JUL2016,Ship,,
1,1,9,U,Ship2,,
1,1,9,U,Ship3,,
1,1,10,U,GP,x,
1,1,10,U,GP,xx,
1,1,10,U,GP,xxx,
1,1,10,U,GP,a,
1,1,10,U,GP,aa,
1,1,10,U,GP,aaa,
1,1,10,U,GP,c,
1,1,10,U,GP,ccc,
1,1,10,U,GP,ccc,
1,1,10,U,GP,rq,
1,1,10,U,GP,cq,
1,1,10,U,GP,freq,
1,1,12,,Results,,
1,1,15,21SEP2016,tot,case1,0
1,1,15,21SEP2016,tot,case2,
2,1,9,11MAY2016,Ship,,
2,1,9,U,Ship2,,
2,1,9,U,Ship3,,
2,1,10,U,GP,x,1
2,1,10,U,GP,xx,1
2,1,10,U,GP,xxx,1
2,1,10,U,GP,a,1
2,1,10,U,GP,aa,1
2,1,10,U,GP,aaa,1
2,1,10,U,GP,c,1
2,1,10,U,GP,cc,1
2,1,10,U,GP,ccc,1
2,1,10,U,GP,rq,
2,1,10,U,GP,cq,1
2,1,10,U,GP,freq,1
2,1,12,21SEP2016,Results,,
2,1,15,21SEP2016,tot,case1,0
2,1,15,21SEP2016,tot,case2,2;

 

Thanks and Regards

clinsas
Calcite | Level 5

Just want to add extra notes that its not necessary to apply all all filters in one go we can split the filter conditions  like
1) if act=ship has a date and act=result has nissng(date) then output1.
2)if   act=ship has a date then parameter case2 of act=tot  shoud have value if not the output2;  and so on.