Dear All,
I would like to combine two rows of data with missing information in different rows.
For example:
Jack / 500 / head / / 23
Jack / / head / patil / 23
Tim / 100 / shoulder/ / 24
Tim / 100 / / Dufwenberg /
The end result would be:
Jack / 500 / head / patil / 23
Tim / 100 / shoulder/ Dufwenberg / 24
Any suggestions of how to perform this in SAS?
Thank you,
Jack
@jackmelbourne wrote:
Yes, exactly.
Updating to last non-missing is standard UPDATE statement. The conditional min/max makes for a bit of extra code for AGE.
data have;
input Name :$upcase4. Age Net Reg Loss;
cards;
home 300 200 50 .
home 100 80 . 5
TDR 80 75 . 3
TDR 70 . 5 .
;;;;
run;
proc print;
run;
data want;
if 0 then set have;
update have(keep=name obs=0) have(drop=age);
by name;
if first.name then age=.;
set have(keep=age rename=(age=_age));
select(name);
when('HOME') age = min(age,_age);
when('TDR') age = max(age,_age);
end;
drop _age;
run;
proc print;
run;
Is this the only case you have, duplicates on two rows, and all non missing values are the same?
I would try to use SQL with max() for this specific scenario.
Well, not typing that in, so this is just examples (note: post test data in the form of a datastep for better answers):
Sort and merge each variable with column1
Use retain statements in one datatep and only output at last.column1
Transpose the data and sort nodupkey.
Sorry, maybe I could make my question a bit clearer with this dataset?
For example:
dataset: housing
Name Age Net Reg Loss
home 300 200 50 .
home 100 80 . 5
TDR 80 75 . 3
TDR 70 . 5 .
I would like to transform the housing data set to look like this
Name Age Net Reg Loss
home 100 80 50 5
TDR 80 75 5 3
What code would you recommend?
@jackmelbourne wrote:
Sorry, maybe I could make my question a bit clearer with this dataset?
For example:
dataset: housing
Name Age Net Reg Loss
home 300 200 50 .
home 100 80 . 5
TDR 80 75 . 3
TDR 70 . 5 .
I would like to transform the housing data set to look like this
Name Age Net Reg Loss
home 100 80 50 5
TDR 80 75 5 3
What code would you recommend?
Why does age 100 update 300 for NAME=HOME while age 70 does not update 80 for NAME=TDR?
Hi,
Sorry, should have made that clearer.
I need the lower value for home and the higher value for TDR when there are values for both observations.
Thanks,
Jack
@jackmelbourne wrote:
Hi,
Sorry, should have made that clearer.
I need the lower value for home and the higher value for TDR when there are values for both observations.
Thanks,
Jack
And the other variables update to last non-missing?
Yes, exactly.
@jackmelbourne wrote:
Yes, exactly.
Updating to last non-missing is standard UPDATE statement. The conditional min/max makes for a bit of extra code for AGE.
data have;
input Name :$upcase4. Age Net Reg Loss;
cards;
home 300 200 50 .
home 100 80 . 5
TDR 80 75 . 3
TDR 70 . 5 .
;;;;
run;
proc print;
run;
data want;
if 0 then set have;
update have(keep=name obs=0) have(drop=age);
by name;
if first.name then age=.;
set have(keep=age rename=(age=_age));
select(name);
when('HOME') age = min(age,_age);
when('TDR') age = max(age,_age);
end;
drop _age;
run;
proc print;
run;
Perfect, thank you so much! Seems to work 🙂
(simplistic 🙂 ) solution:
Data A;
Format Name $8. Value 8. BodyPart $8. UnknownVar $12. Value2 8.;
Input Name $ Value BodyPart $ UnknownVar $ Value2;
Datalines;
Jack 500 head . 23
Jack . head patil 23
Tim 100 shoulder . 24
Tim 100 . Dufwenberg .
;
* ^^^ check if this is similar to your data here ..;
Data B (Keep=Dummy:
Rename=(DummyName=Name
DummyValue=Value
DummyBodyPart=BodyPart
DummyUnknownVar=UnknownVar
DummyValue2=Value2));
Set A;
Format DummyName $8. DummyValue 8. DummyBodyPart $8. DummyUnknownVar $12. DummyValue2 8.;
Retain DummyName DummyValue DummyBodyPart DummyUnknownVar DummyValue2;
By /*Notsorted*/ Name;
If not Missing (Name) Then DummyName=Name; * hm, different var types are a bit awkward ..;
If not Missing (Value) Then DummyValue=Value; * .. could try array otherwise ..;
If not Missing (BodyPart) Then DummyBodyPart=BodyPart; * .. or maybe call execute;
If not Missing (UnknownVar) Then DummyUnknownVar=UnknownVar;
If not Missing (Value2) Then DummyValue2=Value2;
If Last.Name Then Output;
Run;
Keep the last non-missing value for each variable
I think there is an error in you 'want' table
TDR(AGE)=70 not 80
HAVE
====
Up to 40 obs WORK.HAVE total obs=4
Obs NAME AGE NET REG LOSS
1 home 300 200 50 .
2 home 100 80 . 5
3 TDR 80 75 . 3
4 TDR 70 . 5 .
WANT
====
Obs NAME AGE NET REG LOSS
1 home 100 80 50 5
2 TDR 70 75 5 3
SOLUTION
* I set this up to work with an arbitrary number of numeric columns;
* Do not need disubl if you hardcode variable names;
* I wish SAS would make the number of coluns available at compilation time;
data have;
input Name $ Age Net Reg Loss;
cards4;
home 300 200 50 .
home 100 80 . 5
TDR 80 75 . 3
TDR 70 . 5 .
;;;;
run;quit;
data _null_;
set have(obs=1);
array nums _numeric_;
call symputx('nums',put(dim(nums),5.));
rc=dosubl('
data want;
set have;
by name notsorted;
array nums _numeric_;
array savs[&nums.] _temporary_;
do i=1 to dim(nums);
if nums[i] ne . then savs[i]=nums[i];
end;
do i=1 to dim(nums);
nums[i]=savs[i];
end;
if last.name then output;
run;quit;
');
run;quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.