BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MsGeritO
Obsidian | Level 7

Hello,

 

I need to update a dataset. I get the first level of complication done:

 

data have;
input id concern$ morestuff$;
datalines;
1 . whatever
1 . other
2 . whatever
2 . other
3 code whatever
3 code other
4 code whatever
4 code other
5 code whatever
6 . whatever
;
run;
data transaction;
input id concern$;
datalines;
1 .
2 tx
3 .
4 tx
;
run;

 

data want; set have;

run; *Just to keep "have" and "want" separate;

 

proc sql;
   update want master
set concern=(select concern from transaction trans
where master.id=trans.id)
where master.concern = "."; *As "code" can be any of 6,730 options;

 

*Now I need to add further complications: ;

 

data transaction2;
input id concern$ ;
datalines;
1 .
1 .
2 tx
2 txt
3 .
4 tx
;
run;

 

* "tx" or "txt" could be anything alphanumeric, number of options are hard to count;

 

data otherwant;
input id concern$ morestuff$;
datalines;
1 . whatever
1 . other
2 tx whatever
2 tx other
2 txt whatever
2 txt other
3 code whatever
3 code other
4 code whatever
4 code other
4 tx whatever
4 tx other
5 code whatever
6 . whatever
;
run;

 

And this is the point where I am stuck. I have tried some options, but none gave the right solution. As far as I can tell we are using SAS 9.4.

 

I hope somebody can help me out.

 

Anything is appreciated.

 

Gerit

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
MsGeritO
Obsidian | Level 7

I think I found a solution. I couldn't solve it in one step. Maybe there is a better way to solve it, but this gives the correct result.

 


* 1. Create data set with no missing "concern";
data trans2; set transaction2;
   if concern ^= "";
run;
* 2. Check if "concern" is in "have2" as well as "trans2".;
data have2; set want;
  newid = strip(id)||"."||strip(concern);
run;
data trans2; set trans2;
  newid = strip(id)||"."||strip(concern);
run;
* Those in trans2 with no entry in have2 will have empty
  cells for variable "morestuff";
proc sql;
  create table test6 as
  select coalesce(a.id,b.id) as id,
         a.*,
         b.morestuff
  from   trans2 a left join have2 b
   on     (a.newid eq b.newid);
quit;
* Keep only those of test6 (origin: trans2)
  which have no entry in "morestuff", i.e.
  those which are not (yet) in have2, but
  with more information in trans2;
data interim; set test6;
   keep id newid positive;
   if morestuff = "" then positive = 1;
   if positive = 1;
run;
* Transfer knowledge that there is more information
  in trans2 for specific observations in have2 to have2;
data have2; merge have2 interim;
   drop newid;
   by id;
run;
* Single out those observations which have more information
  in trans2. Keep those variables which need to be concottonated
  to information in trans2. Here only "morestuff".;
data morestuff; set have2;
   keep id morestuff;
   if positive = 1;
run;
* Just in case that there can be duplicates;
proc sort data=morestuff;
   by id morestuff;
run;
data shortmorestuff; set morestuff;
   by id morestuff;
   if first.morestuff then output;
run;
* Add all relevant information from shortmorestuff to
  test6 (origin: trans2, but only relevant observations);
* First: Create data set similar to interim, but with
  relevant information;
data relevant; set test6;
   keep id concern ;
   if morestuff = "";
run;
* Second: Bring together;
proc sql;
   create table transwithadded as
   select coalesce(a.id, b.id) as id, a.*, b.*
   from relevant a natural full join shortmorestuff b
   ;
* Attach transwithadded to have2;
data result; set have2 transwithadded;
 drop positive ;
run;

View solution in original post

10 REPLIES 10
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Looks like a simple full join to me.

proc sql;
  create table want as
  select coalsece(a.id,b.id) as id,
         a.morestuff,
         b.concern
  from   have a
  full join transaction b
  on     a.id=b.id;
quit;
MsGeritO
Obsidian | Level 7

Thank you. However, this overwrites "code" in all cases which I need to remain.

RW9
Diamond | Level 26 RW9
Diamond | Level 26
proc sql;
  create table want as
  select coalsece(a.id,b.id) as id,
         a.morestuff,
         coalesce(a.concern,b.concern) as concern
  from   have a
  full join transaction b
  on     a.id=b.id;
quit;
MsGeritO
Obsidian | Level 7

With this solution I am still missing the resulting observations

 

...

2 txt whatever
2 txt other
...
4 tx whatever
4 tx other

...

 

Whereas I need to keep all the observations which your latest code produced.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You will need to be clearer on what issues are.  I cannot guess.  I run this code, and it looks fine:

data otherwant;
input id concern$ morestuff$;
datalines;
1 . whatever
1 . other
2 tx whatever
2 tx other
2 txt whatever
2 txt other
3 code whatever
3 code other
4 code whatever
4 code other
4 tx whatever
4 tx other
5 code whatever
6 . whatever
;
run;
data have;
input id concern$ morestuff$;
datalines;
1 . whatever
1 . other
2 . whatever
2 . other
3 code whatever
3 code other
4 code whatever
4 code other
5 code whatever
6 . whatever
;
run;
proc sql;
  create table want as
  select coalesce(a.id,b.id) as id,
         a.morestuff,
         coalesce(a.concern,b.concern) as concern
  from   have a
  full join otherwant b
  on     a.id=b.id;
quit;
MsGeritO
Obsidian | Level 7

Dataset "otherwant" is the wanted other result. I need to acchieve it from the datasets "have" and "transaction2".

 

I am very sorry about the confusion.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Am afraid your data is to blame here.  Merging will only bring in at maximum, the product of the two available, i.e. the total obs.  You want more records on that based on number of observations in one of the dataset - but only in some instances.  Therefore you manually need to code this in, something like:

proc transpose data=have out=inter;
  by id;
  var concern;
  id morestuff;
run;

data inter;
  merge inter transaction (rename=(concern=c));
  by id;
  if whatever="" and other="" then do;
    concern=c;
    output;
    concern=c;
    output;
  end;
  else if whatever ne "" and other ne "" then do;
    concern=whatever;
    output;
    concern=other;
    output;
  ...
  
run;

I would really advise to look at the whole process and see if you can alter it, or provide more accurate test samples so we can see what it is you really want to do, and what you have.

MsGeritO
Obsidian | Level 7

Thank you for all your effort.

 

I will have to look into your solution for a while longer and try to make this work. It will take a while.

 

Thanks again.

MsGeritO
Obsidian | Level 7

I think I found a solution. I couldn't solve it in one step. Maybe there is a better way to solve it, but this gives the correct result.

 


* 1. Create data set with no missing "concern";
data trans2; set transaction2;
   if concern ^= "";
run;
* 2. Check if "concern" is in "have2" as well as "trans2".;
data have2; set want;
  newid = strip(id)||"."||strip(concern);
run;
data trans2; set trans2;
  newid = strip(id)||"."||strip(concern);
run;
* Those in trans2 with no entry in have2 will have empty
  cells for variable "morestuff";
proc sql;
  create table test6 as
  select coalesce(a.id,b.id) as id,
         a.*,
         b.morestuff
  from   trans2 a left join have2 b
   on     (a.newid eq b.newid);
quit;
* Keep only those of test6 (origin: trans2)
  which have no entry in "morestuff", i.e.
  those which are not (yet) in have2, but
  with more information in trans2;
data interim; set test6;
   keep id newid positive;
   if morestuff = "" then positive = 1;
   if positive = 1;
run;
* Transfer knowledge that there is more information
  in trans2 for specific observations in have2 to have2;
data have2; merge have2 interim;
   drop newid;
   by id;
run;
* Single out those observations which have more information
  in trans2. Keep those variables which need to be concottonated
  to information in trans2. Here only "morestuff".;
data morestuff; set have2;
   keep id morestuff;
   if positive = 1;
run;
* Just in case that there can be duplicates;
proc sort data=morestuff;
   by id morestuff;
run;
data shortmorestuff; set morestuff;
   by id morestuff;
   if first.morestuff then output;
run;
* Add all relevant information from shortmorestuff to
  test6 (origin: trans2, but only relevant observations);
* First: Create data set similar to interim, but with
  relevant information;
data relevant; set test6;
   keep id concern ;
   if morestuff = "";
run;
* Second: Bring together;
proc sql;
   create table transwithadded as
   select coalesce(a.id, b.id) as id, a.*, b.*
   from relevant a natural full join shortmorestuff b
   ;
* Attach transwithadded to have2;
data result; set have2 transwithadded;
 drop positive ;
run;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, if it works all the better Smiley Happy  There are always "better" ways, but working code is better than possibility.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1624 views
  • 0 likes
  • 2 in conversation