BookmarkSubscribeRSS Feed
kp19
Fluorite | Level 6

I am working on this complex query where I need to identify the organism for a patient and look for the same organism at the subsequent visits and flag the value if it's not there. More explanation is given after the data. Here is what data looks like:

 Actual Data:

ID  Visit   Org1   Resolution1   Org2   Resolution2   Org3   Resolution3   Org4   Resolution4

1      1      121          Yes           220            No          225            No           301         No

1      2      220          Yes           301            No

1      3     

1      4

 

So, in the data example above, there are 4 organisms reported for ID=1 at Visit=1 and 3 of these organisms did not resolve at this visit(subsequent resolution=no). I need to check if these unresolved organisms were reported at visit=2 or not. If they are not reported than that visit should be flagged with missing organism. In the example, organisms 220.225 and 301 were not resolved at the 1st visit and therefore should be checked at 2nd visit. Now at 2nd visit organisms 220 and 301 were reported but not 225 so the corresponding flag variable should report this organism. Also at the 2nd visit 301 was not resolved so we need to check if 301 was reported at 3rd visit. In the example, 301 was not reported at 3rd visit and therefore corresponding flag variable should report this.

 

Here is the challenging part though:

1.) Each patient might have somewhere 1 to 9 visits and it varies from patient to patient.

2.) Each patient might have up to 9 organisms reported at any visits.

3.) The organism flag variable Org_Flag should have multiple organisms if query returns more than one missing organism at any visit. eg 220,225 for the same visit.

 

Here is the data I want:

ID  Visit   Org1   Resolution1   Org2   Resolution2   Org3   Resolution3   Org4   Resolution4    Org_Flag

1      1      121          Yes           220            No          225            No           301         No               

1      2      220          Yes           301            No                                                                                 225

1      3                                                                                                                                               301

1      4

 

Any help would be appreciated with this query.

5 REPLIES 5
Astounding
PROC Star

Based on a quick look, I would guess that the structure of your data makes this a difficult problem.  Normalize your data, and it turns into an easy problem.  To normalize:

 

data normalized;
   set have;
   array org {9};
   array res {9} resolution1-resolution9;
   do k=1 to 9;
      if if res{k} > ' ' then do;
         resolution = res{k};
         organism = org{k};
         output;
      end;
   end;
   keep id visit organism resolution;
run;

Having done that, it should be easy to sort by ID ORGANISM VISIT and find the answers you are looking for.

 

 

 

Patrick
Opal | Level 21

@Astounding 

Also that I fully agree with you that a normalized data structure is often better to work with, I can't really think of a coding approach for the current use case which gets much simpler when using a narrow data structure.

In the code you've posted you're dropping visits 3 & 4. I've fixed that in below code. 

Can you please outline how a simple coding approach for a narrow data structure could look like? Something that returns orgs 555 and 301 for visit 3 if using the data below.

data have;
  infile cards truncover;
  input ID Visit Org1 Resolution1 $ Org2 Resolution2 $ Org3 Resolution3 $ Org4 Resolution4 $;
  cards;
1 1 121 Yes 220 No 225 No 301 No
1 2 220 Yes 301 No 555 No
1 3
1 4
;
run;

data normalized;
   set have;
   array _orgs {*} org:;
   array _res {*} resolution:;
   if cmiss(of _orgs[*])=dim(_orgs) then output;
   else
     do k=1 to dim(_orgs);
        if _res{k} > ' ' then do;
           resolution = _res{k};
           organism = _orgs{k};
           output;
        end;
     end;
   keep id visit organism resolution;
run;

 

Astounding
PROC Star

@Patrick ,

 

With normalized data, the structure of the flags might change a little.  But it should be simple, something along these lines:

 

proc sort data=normalized;

  by id organism visit;

run;

 

data check_these;

   set normalized;

   by id organism visit;

   if last.organism and resolution="No";

run;

 

So in this case, I didn't even bother to create flags.  I output only the cases that remained unresolved.

 

There's a more complex situation that I didn't bother to investigate.  It might be possible that an organism has "No" at visit 1, doesn't appear at visit 2, and has "Yes" at visit 3.  That would take a little more programming ... think DIF(VISIT) ... but shouldn't be that difficult to identify.

novinosrin
Tourmaline | Level 20
data have;
infile cards truncover;
input ID  Visit   Org1   Resolution1 $  Org2   Resolution2 $  Org3   Resolution3 $  Org4   Resolution4 $;
cards;
1      1      121          Yes           220            No          225            No           301         No
1      2      220          Yes           301            No
1      3      .				.			.				.
1      4
;

data want;
set have;
by id;
array t(9) _temporary_;
if first.id then call missing(of t(*));
array or(*) org:;
array r(*) resolution:;
retain t;
n=0;
if visit=1 then 
do;
	do i=1 to dim(r);
		res=r(I);
		org=or(i);
		if  cmiss(res,org)=0 and res='No' then 
			do;
				n+1;
				t(n)=org;
			end;
	end;
end;
else do;
do i=1 to dim(t);
	if t(i)>. then k=whichn(t(i),of or(*));
	if k>0 then if r(k)='Yes' then call missing(t(i));
	if k=0 then do; flag=t(i);call missing(t(i));leave;end;
end;
end;
drop n k res org i;
run;
Patrick
Opal | Level 21

@kp19 

Below an approach using a SAS Hash table lookup.

data have;
  infile cards truncover;
  input ID Visit Org1 Resolution1 $ Org2 Resolution2 $ Org3 Resolution3 $ Org4 Resolution4 $;
  cards;
1 1 121 Yes 220 No 225 No 301 No
1 2 220 Yes 301 No 555 No
1 3
1 4
;
run;

data want(drop=_:);
  set have;
  by id visit;
  length Org_Flag $40;
  array orgs {*} org1-org4;
  array reso {*} resolution1-resolution4;

  if _n_=1 then
    do;
      if 0 then _org=org1;
      dcl hash h1(ordered:'y');
      h1.defineKey('id','visit','_org');
      h1.defineData('_org');
      h1.defineDone();
      dcl hiter hh1('h1');
    end;

  _lag_visit=lag(visit);
  if not first.id then 
    do;
      /* remove Orgs from previous visit also reported in current visit */
      if _lag_visit=visit-1 then
        do;
          do _i=1 to dim(orgs) while (not missing(orgs[_i]));
            if h1.check(key:id,key:(visit-1),key:orgs[_i])=0 then h1.remove(key:id,key:(visit-1),key:orgs[_i]);
          end;
        end;
      else
        do;
          _rc=h1.clear();
        end;

      /* Org_Flas: Orgs reported in previous visit with resolution of No but not reported in current visit */
      _rc = hh1.first();
      do while (_rc = 0);
        Org_Flag=catx(',',Org_Flag,_org);
        _rc = hh1.next();
      end;
    end;

  /* populate hash with Orgs from current visit reported as No for check in next visit */
  _rc=h1.clear();
  do _i=1 to dim(orgs) while (not missing(orgs[_i]));
    if reso[_i]='No' then h1.ref(key:id,key:visit,key:orgs[_i],data:orgs[_i]);
  end;
run;

proc print data=want;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 913 views
  • 3 likes
  • 4 in conversation