Re: Wrong data step to flag data

tmm · Posted 01-27-2014 10:01 AM

I am trying to create a data step that will flag certain data from one table into another table. I tried this code but it does not do what I need it to:

data cci.phyclms11; set cci.phyclms10 cci.cci0 (keep proc1 proc2) key=proc1 proc2;

run;

DATA HAVE TABLE 1			DATA HAVE TABLE 2
proc1	proc2	edit	CLAIM	PROC1	PROC2	NET1	NET2	ADJD1	ADJD2	ADJDT1	ADJDT2
100	G0427	0	111	100	3032	5	5	111	873	10/10/2013	10/10/2013
102	0311T	0	222	100	G0427	5	5	111	873	10/11/2013	10/11/2013
102	36010	0	333	102	0	19	15	111	873	10/12/2013	10/12/2013
102	76998	0	444	102	36010	20	20	111	873	10/13/2013	10/13/2013
DATA NEED
CLAIM	PROC1	PROC2	NET1	NET2	ADJD1	ADJD2	ADJDT1	ADJDT2	TBL1 PROC1	TBL1 PROC2	TBL1 EDIT
111	100	3032	5	5	111	873	10/10/2013	10/10/2013
222	100	G0427	5	5	111	873	10/11/2013	10/11/2013	100	G0427	0
333	102	0	19	15	111	873	10/12/2013	10/12/2013
444	102	36010	20	20	111	873	10/13/2013	10/13/2013	102	36010	0
OR
CLAIM	PROC1	PROC2	NET1	NET2	ADJD1	ADJD2	ADJDT1	ADJDT2	FLAG
111	100	3032	5	5	111	873	10/10/2013	10/10/2013	0
222	100	G0427	5	5	111	873	10/11/2013	10/11/2013	1
333	102	0	19	15	111	873	10/12/2013	10/12/2013	0
444	102	36010	20	20	111	873	10/13/2013	10/13/2013	1

I was thnking how I could just somehow do a data step and an if statement on the proc1 = proc1 and proc2=proc2 and say if they are equal flag 1 if not then 0.

Not entirely sure.

art297 · Posted 01-27-2014 11:09 AM

You want to MERGE the two files. As long as there aren't any many-to-many merge cases, you can do that in a data step using the merge statement. See: Step-by-Step Programming with Base SAS(R) Software

CTorres · Posted 01-27-2014 11:13 AM

yChanged the SET Try this:

data cci.phyclms11;

merge cci.phyclms10 (in=T1) cci.cci0 (drop=edit in=T2);

by proc1 proc2;

flag= t1 and t2;

if t1;

run;

Regards,

Murray_Court · Posted 01-27-2014 11:13 AM

It looks like you are only actually merging on proc1 from your example.

SQL should do the trick.

proc sql;

create table want a select a.*, b.proc1 as tbl1_proc1, b.proc2 as tbl1_proc2

from

have1 as b

inner join

have2 as a

on

a.proc1=b.proc1

;

quit;

Patrick · Posted 01-27-2014 02:52 PM

You want to do a table lookup. Below 2 tested code options of how you can achieve this. A "Merge" as Art suggests would be another alternative - and if you Google you would also find examples using the data step Array statement or for a single key/value pair lookup the use of SAS formats.

/* SQL option */
proc sql;
create table want1 as
    select h.*, case(l.edit) when '0' then 1 else 0 end as flag
      from have h left join lookup l
        on h.proc1=l.proc1 and h.proc2=l.proc2
;
quit;

/* SAS Hash option */
data want2(drop=_:);
set have;

if _n_=1 then
    do;
      declare hash h(dataset:'lookup(keep=proc1 proc2)');
      _rc=h.defineKey('proc1', 'proc2');
      _rc=h.defineDone();
    end;

flag= (h.check()=0);
run;

Your proposed solution won't work as an interleaving set statement reads only from one table in one iteration of a data step. Using the sample data your result table would contain 8 rows.

I believe it should be a left-join. Using an inner join would result in 2 rows only.

As Art mentioned: If using a join/merge make sure the (composite) key used for joining is unique in your look-up table.

When loading a SAS Hash de-duplication happens automatically if not using "multidata". Also: In a normal scenario a lookup table is normally rather small but the base table can be quite big. Loading the lookup table into a hash avoids the need for sorting (either explicit as preparation for a "merge" or implicit done by the SQL when joining the tables). Coding a hash is a bit more typing but gives you often better performance.

CTorres · Posted 01-27-2014 03:45 PM

Hi Patrick,

You are right. The instructuion I meant was a MERGE instead a SET.

I already Edited the original message.

My solution gives only four observations and asumes that both datasets are sorted by proc1 proc2.

My test program follows:

DATA TABLE1;
input PROC1 PROC2 $ EDIT;
Datalines;
100 G0427 0
102 0311T 0
102 36010 0
102 76998 0
;
run;

DATA TABLE2 ;
input CLAIM PROC1 PROC2 $ NET1 NET2 ADJD1 ADJD2 ADJDT1 $10. ADJDT2 $10.;
Datalines;
111 100 3032 5 5 111 873 10/10/2013 10/10/2013
222 100 G0427 5 5 111 873 10/11/2013 10/11/2013
333 102 0 19 15 111 873 10/12/2013 10/12/2013
444 102 36010 20 20 111 873 10/13/2013 10/13/2013
;
run;

Data need;
merge table2 (in=t2) table1(drop=edit in=t1);
by proc1 proc2;
flag=t1 and t2;
if t2;
run;

Regards,

Wrong data step to flag data