Dear All,
I would like to combine two rows of data with missing information in different rows.
For example:
Jack / 500 / head / / 23
Jack / / head / patil / 23
Tim / 100 / shoulder/ / 24
Tim / 100 / / Dufwenberg /
The end result would be:
Jack / 500 / head / patil / 23
Tim / 100 / shoulder/ Dufwenberg / 24
Any suggestions of how to perform this in SAS?
Thank you,
Jack
@jackmelbourne wrote:
Yes, exactly.
Updating to last non-missing is standard UPDATE statement. The conditional min/max makes for a bit of extra code for AGE.
data have;
input Name :$upcase4. Age Net Reg Loss;
cards;
home 300 200 50 .
home 100 80 . 5
TDR 80 75 . 3
TDR 70 . 5 .
;;;;
run;
proc print;
run;
data want;
if 0 then set have;
update have(keep=name obs=0) have(drop=age);
by name;
if first.name then age=.;
set have(keep=age rename=(age=_age));
select(name);
when('HOME') age = min(age,_age);
when('TDR') age = max(age,_age);
end;
drop _age;
run;
proc print;
run;
Is this the only case you have, duplicates on two rows, and all non missing values are the same?
I would try to use SQL with max() for this specific scenario.
Well, not typing that in, so this is just examples (note: post test data in the form of a datastep for better answers):
Sort and merge each variable with column1
Use retain statements in one datatep and only output at last.column1
Transpose the data and sort nodupkey.
Sorry, maybe I could make my question a bit clearer with this dataset?
For example:
dataset: housing
Name Age Net Reg Loss
home 300 200 50 .
home 100 80 . 5
TDR 80 75 . 3
TDR 70 . 5 .
I would like to transform the housing data set to look like this
Name Age Net Reg Loss
home 100 80 50 5
TDR 80 75 5 3
What code would you recommend?
@jackmelbourne wrote:
Sorry, maybe I could make my question a bit clearer with this dataset?
For example:
dataset: housing
Name Age Net Reg Loss
home 300 200 50 .
home 100 80 . 5
TDR 80 75 . 3
TDR 70 . 5 .
I would like to transform the housing data set to look like this
Name Age Net Reg Loss
home 100 80 50 5
TDR 80 75 5 3
What code would you recommend?
Why does age 100 update 300 for NAME=HOME while age 70 does not update 80 for NAME=TDR?
Hi,
Sorry, should have made that clearer.
I need the lower value for home and the higher value for TDR when there are values for both observations.
Thanks,
Jack
@jackmelbourne wrote:
Hi,
Sorry, should have made that clearer.
I need the lower value for home and the higher value for TDR when there are values for both observations.
Thanks,
Jack
And the other variables update to last non-missing?
Yes, exactly.
@jackmelbourne wrote:
Yes, exactly.
Updating to last non-missing is standard UPDATE statement. The conditional min/max makes for a bit of extra code for AGE.
data have;
input Name :$upcase4. Age Net Reg Loss;
cards;
home 300 200 50 .
home 100 80 . 5
TDR 80 75 . 3
TDR 70 . 5 .
;;;;
run;
proc print;
run;
data want;
if 0 then set have;
update have(keep=name obs=0) have(drop=age);
by name;
if first.name then age=.;
set have(keep=age rename=(age=_age));
select(name);
when('HOME') age = min(age,_age);
when('TDR') age = max(age,_age);
end;
drop _age;
run;
proc print;
run;
Perfect, thank you so much! Seems to work 🙂
(simplistic 🙂 ) solution:
Data A;
Format Name $8. Value 8. BodyPart $8. UnknownVar $12. Value2 8.;
Input Name $ Value BodyPart $ UnknownVar $ Value2;
Datalines;
Jack 500 head . 23
Jack . head patil 23
Tim 100 shoulder . 24
Tim 100 . Dufwenberg .
;
* ^^^ check if this is similar to your data here ..;
Data B (Keep=Dummy:
Rename=(DummyName=Name
DummyValue=Value
DummyBodyPart=BodyPart
DummyUnknownVar=UnknownVar
DummyValue2=Value2));
Set A;
Format DummyName $8. DummyValue 8. DummyBodyPart $8. DummyUnknownVar $12. DummyValue2 8.;
Retain DummyName DummyValue DummyBodyPart DummyUnknownVar DummyValue2;
By /*Notsorted*/ Name;
If not Missing (Name) Then DummyName=Name; * hm, different var types are a bit awkward ..;
If not Missing (Value) Then DummyValue=Value; * .. could try array otherwise ..;
If not Missing (BodyPart) Then DummyBodyPart=BodyPart; * .. or maybe call execute;
If not Missing (UnknownVar) Then DummyUnknownVar=UnknownVar;
If not Missing (Value2) Then DummyValue2=Value2;
If Last.Name Then Output;
Run;
Keep the last non-missing value for each variable
I think there is an error in you 'want' table
TDR(AGE)=70 not 80
HAVE
====
Up to 40 obs WORK.HAVE total obs=4
Obs NAME AGE NET REG LOSS
1 home 300 200 50 .
2 home 100 80 . 5
3 TDR 80 75 . 3
4 TDR 70 . 5 .
WANT
====
Obs NAME AGE NET REG LOSS
1 home 100 80 50 5
2 TDR 70 75 5 3
SOLUTION
* I set this up to work with an arbitrary number of numeric columns;
* Do not need disubl if you hardcode variable names;
* I wish SAS would make the number of coluns available at compilation time;
data have;
input Name $ Age Net Reg Loss;
cards4;
home 300 200 50 .
home 100 80 . 5
TDR 80 75 . 3
TDR 70 . 5 .
;;;;
run;quit;
data _null_;
set have(obs=1);
array nums _numeric_;
call symputx('nums',put(dim(nums),5.));
rc=dosubl('
data want;
set have;
by name notsorted;
array nums _numeric_;
array savs[&nums.] _temporary_;
do i=1 to dim(nums);
if nums[i] ne . then savs[i]=nums[i];
end;
do i=1 to dim(nums);
nums[i]=savs[i];
end;
if last.name then output;
run;quit;
');
run;quit;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.