BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jackmelbourne
Fluorite | Level 6

Dear All,

 

I would like to combine two rows of data with missing information in different rows.

 

For example:

 

Jack / 500 / head /     / 23

Jack /       / head / patil / 23

Tim / 100 / shoulder/     / 24

Tim / 100 /              / Dufwenberg /   

 

The end result would be:

Jack / 500 / head / patil / 23

Tim / 100 / shoulder/ Dufwenberg / 24

 

Any suggestions of how to perform this in SAS?

 

Thank you,

Jack

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

@jackmelbourne wrote:

Yes, exactly.


Updating to last non-missing is standard UPDATE statement.  The conditional min/max makes for a bit of extra code for AGE.

 

data have;
   input Name :$upcase4. Age Net Reg Loss;
   cards;
home 300 200 50     .
home 100 80   .     5
TDR  80  75   .     3
TDR  70   .   5     .
;;;;
   run;
proc print;
   run;
data want;
   if 0 then set have;
   update have(keep=name obs=0) have(drop=age);
   by name;
   if first.name then age=.;
   set have(keep=age rename=(age=_age));
   select(name);
      when('HOME') age = min(age,_age);
      when('TDR')  age = max(age,_age);
      end;
   drop _age;
   run;
proc print;
   run;

Capture.PNG

View solution in original post

11 REPLIES 11
LinusH
Tourmaline | Level 20

Is this the only case you have, duplicates on two rows, and all non missing values are the same?

I would try to use SQL with max() for this specific scenario.

Data never sleeps
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, not typing that in, so this is just examples (note: post test data in the form of a datastep for better answers):

Sort and merge each variable with column1

Use retain statements in one datatep and only output at last.column1

Transpose the data and sort nodupkey.

jackmelbourne
Fluorite | Level 6

Sorry, maybe I could make my question a bit clearer with this dataset?

 

For example:

 

dataset: housing

 

Name       Age      Net     Reg     Loss

home        300        200     50        . 

home        100        80        .          5

TDR          80          75        .         3

TDR          70           .          5        .

 

I would like to transform the housing data set to look like this

 

Name       Age      Net     Reg     Loss

home        100        80      50         5

TDR          80          75       5         3

 

What code would you recommend?

 

 

data_null__
Jade | Level 19

@jackmelbourne wrote:

Sorry, maybe I could make my question a bit clearer with this dataset?

 

For example:

 

dataset: housing

 

Name       Age      Net     Reg     Loss

home        300        200     50        . 

home        100        80        .          5

TDR          80          75        .         3

TDR          70           .          5        .

 

I would like to transform the housing data set to look like this

 

Name       Age      Net     Reg     Loss

home        100        80      50         5

TDR          80          75       5         3

 

What code would you recommend?

 

 


Why does age 100 update 300 for NAME=HOME while age 70 does not update 80 for NAME=TDR?

jackmelbourne
Fluorite | Level 6

Hi,

 

Sorry, should have made that clearer.

 

I need the lower value for home and the higher value for TDR when there are values for both observations.

 

Thanks,

Jack

data_null__
Jade | Level 19

@jackmelbourne wrote:

Hi,

 

Sorry, should have made that clearer.

 

I need the lower value for home and the higher value for TDR when there are values for both observations.

 

Thanks,

Jack


And the other variables update to last non-missing?

jackmelbourne
Fluorite | Level 6

Yes, exactly.

data_null__
Jade | Level 19

@jackmelbourne wrote:

Yes, exactly.


Updating to last non-missing is standard UPDATE statement.  The conditional min/max makes for a bit of extra code for AGE.

 

data have;
   input Name :$upcase4. Age Net Reg Loss;
   cards;
home 300 200 50     .
home 100 80   .     5
TDR  80  75   .     3
TDR  70   .   5     .
;;;;
   run;
proc print;
   run;
data want;
   if 0 then set have;
   update have(keep=name obs=0) have(drop=age);
   by name;
   if first.name then age=.;
   set have(keep=age rename=(age=_age));
   select(name);
      when('HOME') age = min(age,_age);
      when('TDR')  age = max(age,_age);
      end;
   drop _age;
   run;
proc print;
   run;

Capture.PNG

jackmelbourne
Fluorite | Level 6

Perfect, thank you so much! Seems to work 🙂

user24feb
Barite | Level 11

(simplistic 🙂 ) solution:

Data A;
  Format Name $8. Value 8. BodyPart $8. UnknownVar $12. Value2 8.;
  Input Name $ Value BodyPart $ UnknownVar $ Value2; 
  Datalines;
Jack 500 head . 23
Jack . head patil 23
Tim 100 shoulder . 24
Tim 100 . Dufwenberg .
;
* ^^^ check if this is similar to your data here ..;

Data B (Keep=Dummy:
		Rename=(DummyName=Name 
				DummyValue=Value
				DummyBodyPart=BodyPart
				DummyUnknownVar=UnknownVar
				DummyValue2=Value2)); 
  Set A;
  Format DummyName $8. DummyValue 8. DummyBodyPart $8. DummyUnknownVar $12. DummyValue2 8.;
  Retain DummyName DummyValue DummyBodyPart DummyUnknownVar DummyValue2;
  By /*Notsorted*/ Name;
  If not Missing (Name) Then DummyName=Name; * hm, different var types are a bit awkward ..;
  If not Missing (Value) Then DummyValue=Value; * .. could try array otherwise ..;
  If not Missing (BodyPart) Then DummyBodyPart=BodyPart; * .. or maybe call execute;
  If not Missing (UnknownVar) Then DummyUnknownVar=UnknownVar;
  If not Missing (Value2) Then DummyValue2=Value2;
  If Last.Name Then Output;
Run;
rogerjdeangelis
Barite | Level 11
Keep the last non-missing value for each variable

I think there is an error in you 'want' table
TDR(AGE)=70 not 80


HAVE
====

Up to 40 obs WORK.HAVE total obs=4

Obs    NAME   AGE  NET  REG  LOSS

 1     home   300  200   50     .
 2     home   100   80    .     5
 3     TDR     80   75    .     3
 4     TDR     70    .    5     .

WANT
====

Obs    NAME   AGE  NET  REG  LOSS

 1     home   100   80   50     5
 2     TDR     70   75    5     3

SOLUTION

* I set this up to work with an arbitrary number of numeric columns;
* Do not need disubl if you hardcode variable names;
* I wish SAS would make the number of coluns available at compilation time;
data have;
input Name $ Age Net Reg Loss;
cards4;
home 300 200 50 .
home 100 80 . 5
TDR 80 75 . 3
TDR 70 . 5 .
;;;;
run;quit;

data _null_;
    set have(obs=1);
    array nums _numeric_;
    call symputx('nums',put(dim(nums),5.));
    rc=dosubl('
      data want;
        set have;
        by name notsorted;
        array nums _numeric_;
        array savs[&nums.] _temporary_;
        do i=1 to dim(nums);
          if nums[i] ne . then savs[i]=nums[i];
        end;
        do i=1 to dim(nums);
          nums[i]=savs[i];
        end;
        if last.name then output;
      run;quit;
    ');
run;quit;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 9079 views
  • 1 like
  • 6 in conversation