Hello,
I need to update a dataset. I get the first level of complication done:
data have;
input id concern$ morestuff$;
datalines;
1 . whatever
1 . other
2 . whatever
2 . other
3 code whatever
3 code other
4 code whatever
4 code other
5 code whatever
6 . whatever
;
run;
data transaction;
input id concern$;
datalines;
1 .
2 tx
3 .
4 tx
;
run;
data want; set have;
run; *Just to keep "have" and "want" separate;
proc sql;
update want master
set concern=(select concern from transaction trans
where master.id=trans.id)
where master.concern = "."; *As "code" can be any of 6,730 options;
*Now I need to add further complications: ;
data transaction2;
input id concern$ ;
datalines;
1 .
1 .
2 tx
2 txt
3 .
4 tx
;
run;
* "tx" or "txt" could be anything alphanumeric, number of options are hard to count;
data otherwant;
input id concern$ morestuff$;
datalines;
1 . whatever
1 . other
2 tx whatever
2 tx other
2 txt whatever
2 txt other
3 code whatever
3 code other
4 code whatever
4 code other
4 tx whatever
4 tx other
5 code whatever
6 . whatever
;
run;
And this is the point where I am stuck. I have tried some options, but none gave the right solution. As far as I can tell we are using SAS 9.4.
I hope somebody can help me out.
Anything is appreciated.
Gerit
I think I found a solution. I couldn't solve it in one step. Maybe there is a better way to solve it, but this gives the correct result.
* 1. Create data set with no missing "concern";
data trans2; set transaction2;
if concern ^= "";
run;
* 2. Check if "concern" is in "have2" as well as "trans2".;
data have2; set want;
newid = strip(id)||"."||strip(concern);
run;
data trans2; set trans2;
newid = strip(id)||"."||strip(concern);
run;
* Those in trans2 with no entry in have2 will have empty
cells for variable "morestuff";
proc sql;
create table test6 as
select coalesce(a.id,b.id) as id,
a.*,
b.morestuff
from trans2 a left join have2 b
on (a.newid eq b.newid);
quit;
* Keep only those of test6 (origin: trans2)
which have no entry in "morestuff", i.e.
those which are not (yet) in have2, but
with more information in trans2;
data interim; set test6;
keep id newid positive;
if morestuff = "" then positive = 1;
if positive = 1;
run;
* Transfer knowledge that there is more information
in trans2 for specific observations in have2 to have2;
data have2; merge have2 interim;
drop newid;
by id;
run;
* Single out those observations which have more information
in trans2. Keep those variables which need to be concottonated
to information in trans2. Here only "morestuff".;
data morestuff; set have2;
keep id morestuff;
if positive = 1;
run;
* Just in case that there can be duplicates;
proc sort data=morestuff;
by id morestuff;
run;
data shortmorestuff; set morestuff;
by id morestuff;
if first.morestuff then output;
run;
* Add all relevant information from shortmorestuff to
test6 (origin: trans2, but only relevant observations);
* First: Create data set similar to interim, but with
relevant information;
data relevant; set test6;
keep id concern ;
if morestuff = "";
run;
* Second: Bring together;
proc sql;
create table transwithadded as
select coalesce(a.id, b.id) as id, a.*, b.*
from relevant a natural full join shortmorestuff b
;
* Attach transwithadded to have2;
data result; set have2 transwithadded;
drop positive ;
run;
Looks like a simple full join to me.
proc sql; create table want as select coalsece(a.id,b.id) as id, a.morestuff, b.concern from have a full join transaction b on a.id=b.id; quit;
Thank you. However, this overwrites "code" in all cases which I need to remain.
proc sql; create table want as select coalsece(a.id,b.id) as id, a.morestuff, coalesce(a.concern,b.concern) as concern from have a full join transaction b on a.id=b.id; quit;
With this solution I am still missing the resulting observations
...
2 txt whatever
2 txt other
...
4 tx whatever
4 tx other
...
Whereas I need to keep all the observations which your latest code produced.
You will need to be clearer on what issues are. I cannot guess. I run this code, and it looks fine:
data otherwant; input id concern$ morestuff$; datalines; 1 . whatever 1 . other 2 tx whatever 2 tx other 2 txt whatever 2 txt other 3 code whatever 3 code other 4 code whatever 4 code other 4 tx whatever 4 tx other 5 code whatever 6 . whatever ; run; data have; input id concern$ morestuff$; datalines; 1 . whatever 1 . other 2 . whatever 2 . other 3 code whatever 3 code other 4 code whatever 4 code other 5 code whatever 6 . whatever ; run; proc sql; create table want as select coalesce(a.id,b.id) as id, a.morestuff, coalesce(a.concern,b.concern) as concern from have a full join otherwant b on a.id=b.id; quit;
Dataset "otherwant" is the wanted other result. I need to acchieve it from the datasets "have" and "transaction2".
I am very sorry about the confusion.
Am afraid your data is to blame here. Merging will only bring in at maximum, the product of the two available, i.e. the total obs. You want more records on that based on number of observations in one of the dataset - but only in some instances. Therefore you manually need to code this in, something like:
proc transpose data=have out=inter; by id; var concern; id morestuff; run; data inter; merge inter transaction (rename=(concern=c)); by id; if whatever="" and other="" then do; concern=c; output; concern=c; output; end; else if whatever ne "" and other ne "" then do; concern=whatever; output; concern=other; output; ... run;
I would really advise to look at the whole process and see if you can alter it, or provide more accurate test samples so we can see what it is you really want to do, and what you have.
Thank you for all your effort.
I will have to look into your solution for a while longer and try to make this work. It will take a while.
Thanks again.
I think I found a solution. I couldn't solve it in one step. Maybe there is a better way to solve it, but this gives the correct result.
* 1. Create data set with no missing "concern";
data trans2; set transaction2;
if concern ^= "";
run;
* 2. Check if "concern" is in "have2" as well as "trans2".;
data have2; set want;
newid = strip(id)||"."||strip(concern);
run;
data trans2; set trans2;
newid = strip(id)||"."||strip(concern);
run;
* Those in trans2 with no entry in have2 will have empty
cells for variable "morestuff";
proc sql;
create table test6 as
select coalesce(a.id,b.id) as id,
a.*,
b.morestuff
from trans2 a left join have2 b
on (a.newid eq b.newid);
quit;
* Keep only those of test6 (origin: trans2)
which have no entry in "morestuff", i.e.
those which are not (yet) in have2, but
with more information in trans2;
data interim; set test6;
keep id newid positive;
if morestuff = "" then positive = 1;
if positive = 1;
run;
* Transfer knowledge that there is more information
in trans2 for specific observations in have2 to have2;
data have2; merge have2 interim;
drop newid;
by id;
run;
* Single out those observations which have more information
in trans2. Keep those variables which need to be concottonated
to information in trans2. Here only "morestuff".;
data morestuff; set have2;
keep id morestuff;
if positive = 1;
run;
* Just in case that there can be duplicates;
proc sort data=morestuff;
by id morestuff;
run;
data shortmorestuff; set morestuff;
by id morestuff;
if first.morestuff then output;
run;
* Add all relevant information from shortmorestuff to
test6 (origin: trans2, but only relevant observations);
* First: Create data set similar to interim, but with
relevant information;
data relevant; set test6;
keep id concern ;
if morestuff = "";
run;
* Second: Bring together;
proc sql;
create table transwithadded as
select coalesce(a.id, b.id) as id, a.*, b.*
from relevant a natural full join shortmorestuff b
;
* Attach transwithadded to have2;
data result; set have2 transwithadded;
drop positive ;
run;
Well, if it works all the better There are always "better" ways, but working code is better than possibility.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.