Solved: Re: Modify, update or other tricks conditionally with two tables and a...

MsGeritO · Posted 11-22-2018 06:47 AM

Hello,

I need to update a dataset. I get the first level of complication done:

data have;
input id concern$ morestuff$;
datalines;
1 . whatever
1 . other
2 . whatever
2 . other
3 code whatever
3 code other
4 code whatever
4 code other
5 code whatever
6 . whatever
;
run;
data transaction;
input id concern$;
datalines;
1 .
2 tx
3 .
4 tx
;
run;

data want; set have;

run; *Just to keep "have" and "want" separate;

proc sql;
update want master
set concern=(select concern from transaction trans
where master.id=trans.id)
where master.concern = "."; *As "code" can be any of 6,730 options;

*Now I need to add further complications: ;

data transaction2;
input id concern$ ;
datalines;
1 .
1 .
2 tx
2 txt
3 .
4 tx
;
run;

* "tx" or "txt" could be anything alphanumeric, number of options are hard to count;

data otherwant;
input id concern$ morestuff$;
datalines;
1 . whatever
1 . other
2 tx whatever
2 tx other
2 txt whatever
2 txt other
3 code whatever
3 code other
4 code whatever
4 code other
4 tx whatever
4 tx other
5 code whatever
6 . whatever
;
run;

And this is the point where I am stuck. I have tried some options, but none gave the right solution. As far as I can tell we are using SAS 9.4.

I hope somebody can help me out.

Anything is appreciated.

Gerit

MsGeritO · Posted 11-28-2018 08:38 AM

I think I found a solution. I couldn't solve it in one step. Maybe there is a better way to solve it, but this gives the correct result.

* 1. Create data set with no missing "concern";
data trans2; set transaction2;
   if concern ^= "";
run;
* 2. Check if "concern" is in "have2" as well as "trans2".;
data have2; set want;
newid = strip(id)||"."||strip(concern);
run;
data trans2; set trans2;
newid = strip(id)||"."||strip(concern);
run;
* Those in trans2 with no entry in have2 will have empty
cells for variable "morestuff";
proc sql;
create table test6 as
select coalesce(a.id,b.id) as id,
         a.*,
       b.morestuff
from   trans2 a left join have2 b
   on     (a.newid eq b.newid);
quit;
* Keep only those of test6 (origin: trans2)
which have no entry in "morestuff", i.e.
those which are not (yet) in have2, but
with more information in trans2;
data interim; set test6;
   keep id newid positive;
   if morestuff = "" then positive = 1;
   if positive = 1;
run;
* Transfer knowledge that there is more information
in trans2 for specific observations in have2 to have2;
data have2; merge have2 interim;
   drop newid;
   by id;
run;
* Single out those observations which have more information
in trans2. Keep those variables which need to be concottonated
to information in trans2. Here only "morestuff".;
data morestuff; set have2;
   keep id morestuff;
   if positive = 1;
run;
* Just in case that there can be duplicates;
proc sort data=morestuff;
   by id morestuff;
run;
data shortmorestuff; set morestuff;
   by id morestuff;
   if first.morestuff then output;
run;
* Add all relevant information from shortmorestuff to
test6 (origin: trans2, but only relevant observations);
* First: Create data set similar to interim, but with
relevant information;
data relevant; set test6;
   keep id concern ;
   if morestuff = "";
run;
* Second: Bring together;
proc sql;
   create table transwithadded as
   select coalesce(a.id, b.id) as id, a.*, b.*
   from relevant a natural full join shortmorestuff b
   ;
* Attach transwithadded to have2;
data result; set have2 transwithadded;
drop positive ;
run;

View solution in original post

RW9 · Posted 11-22-2018 06:53 AM

Looks like a simple full join to me.

proc sql;
  create table want as
  select coalsece(a.id,b.id) as id,
         a.morestuff,
         b.concern
  from   have a
  full join transaction b
  on     a.id=b.id;
quit;

MsGeritO · Posted 11-22-2018 07:41 AM

Thank you. However, this overwrites "code" in all cases which I need to remain.

RW9 · Posted 11-22-2018 08:05 AM

proc sql;
  create table want as
  select coalsece(a.id,b.id) as id,
         a.morestuff,
         coalesce(a.concern,b.concern) as concern
  from   have a
  full join transaction b
  on     a.id=b.id;
quit;

MsGeritO · Posted 11-22-2018 08:31 AM

With this solution I am still missing the resulting observations

...

2 txt whatever
2 txt other
...
4 tx whatever
4 tx other

...

Whereas I need to keep all the observations which your latest code produced.

RW9 · Posted 11-22-2018 08:48 AM

You will need to be clearer on what issues are. I cannot guess. I run this code, and it looks fine:

data otherwant;
input id concern$ morestuff$;
datalines;
1 . whatever
1 . other
2 tx whatever
2 tx other
2 txt whatever
2 txt other
3 code whatever
3 code other
4 code whatever
4 code other
4 tx whatever
4 tx other
5 code whatever
6 . whatever
;
run;
data have;
input id concern$ morestuff$;
datalines;
1 . whatever
1 . other
2 . whatever
2 . other
3 code whatever
3 code other
4 code whatever
4 code other
5 code whatever
6 . whatever
;
run;
proc sql;
  create table want as
  select coalesce(a.id,b.id) as id,
         a.morestuff,
         coalesce(a.concern,b.concern) as concern
  from   have a
  full join otherwant b
  on     a.id=b.id;
quit;

MsGeritO · Posted 11-22-2018 09:00 AM

Dataset "otherwant" is the wanted other result. I need to acchieve it from the datasets "have" and "transaction2".

I am very sorry about the confusion.

RW9 · Posted 11-22-2018 09:29 AM

Am afraid your data is to blame here. Merging will only bring in at maximum, the product of the two available, i.e. the total obs. You want more records on that based on number of observations in one of the dataset - but only in some instances. Therefore you manually need to code this in, something like:

proc transpose data=have out=inter;
  by id;
  var concern;
  id morestuff;
run;

data inter;
  merge inter transaction (rename=(concern=c));
  by id;
  if whatever="" and other="" then do;
    concern=c;
    output;
    concern=c;
    output;
  end;
  else if whatever ne "" and other ne "" then do;
    concern=whatever;
    output;
    concern=other;
    output;
  ...
  
run;

I would really advise to look at the whole process and see if you can alter it, or provide more accurate test samples so we can see what it is you really want to do, and what you have.

MsGeritO · Posted 11-26-2018 08:27 AM

Thank you for all your effort.

I will have to look into your solution for a while longer and try to make this work. It will take a while.

Thanks again.

MsGeritO · Posted 11-28-2018 08:38 AM

I think I found a solution. I couldn't solve it in one step. Maybe there is a better way to solve it, but this gives the correct result.

* 1. Create data set with no missing "concern";
data trans2; set transaction2;
   if concern ^= "";
run;
* 2. Check if "concern" is in "have2" as well as "trans2".;
data have2; set want;
newid = strip(id)||"."||strip(concern);
run;
data trans2; set trans2;
newid = strip(id)||"."||strip(concern);
run;
* Those in trans2 with no entry in have2 will have empty
cells for variable "morestuff";
proc sql;
create table test6 as
select coalesce(a.id,b.id) as id,
         a.*,
       b.morestuff
from   trans2 a left join have2 b
   on     (a.newid eq b.newid);
quit;
* Keep only those of test6 (origin: trans2)
which have no entry in "morestuff", i.e.
those which are not (yet) in have2, but
with more information in trans2;
data interim; set test6;
   keep id newid positive;
   if morestuff = "" then positive = 1;
   if positive = 1;
run;
* Transfer knowledge that there is more information
in trans2 for specific observations in have2 to have2;
data have2; merge have2 interim;
   drop newid;
   by id;
run;
* Single out those observations which have more information
in trans2. Keep those variables which need to be concottonated
to information in trans2. Here only "morestuff".;
data morestuff; set have2;
   keep id morestuff;
   if positive = 1;
run;
* Just in case that there can be duplicates;
proc sort data=morestuff;
   by id morestuff;
run;
data shortmorestuff; set morestuff;
   by id morestuff;
   if first.morestuff then output;
run;
* Add all relevant information from shortmorestuff to
test6 (origin: trans2, but only relevant observations);
* First: Create data set similar to interim, but with
relevant information;
data relevant; set test6;
   keep id concern ;
   if morestuff = "";
run;
* Second: Bring together;
proc sql;
   create table transwithadded as
   select coalesce(a.id, b.id) as id, a.*, b.*
   from relevant a natural full join shortmorestuff b
   ;
* Attach transwithadded to have2;
data result; set have2 transwithadded;
drop positive ;
run;

RW9 · Posted 11-28-2018 09:11 AM

Well, if it works all the better There are always "better" ways, but working code is better than possibility.

Modify, update or other tricks conditionally with two tables and adding observations

Re: Modify, update or other tricks conditionally with two tables and adding observations

Re: Modify, update or other tricks conditionally with two tables and adding observations

Re: Modify, update or other tricks conditionally with two tables and adding observations

Re: Modify, update or other tricks conditionally with two tables and adding observations

Re: Modify, update or other tricks conditionally with two tables and adding observations

Re: Modify, update or other tricks conditionally with two tables and adding observations

Re: Modify, update or other tricks conditionally with two tables and adding observations

Re: Modify, update or other tricks conditionally with two tables and adding observations

Re: Modify, update or other tricks conditionally with two tables and adding observations

Re: Modify, update or other tricks conditionally with two tables and adding observations

Re: Modify, update or other tricks conditionally with two tables and adding observations

SAS Innovate 2025: Call for Content

Classroom Training Available!