BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jackmelbourne
Fluorite | Level 6

Dear All,

 

I would like to combine two rows of data with missing information in different rows.

 

For example:

 

Jack / 500 / head /     / 23

Jack /       / head / patil / 23

Tim / 100 / shoulder/     / 24

Tim / 100 /              / Dufwenberg /   

 

The end result would be:

Jack / 500 / head / patil / 23

Tim / 100 / shoulder/ Dufwenberg / 24

 

Any suggestions of how to perform this in SAS?

 

Thank you,

Jack

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

@jackmelbourne wrote:

Yes, exactly.


Updating to last non-missing is standard UPDATE statement.  The conditional min/max makes for a bit of extra code for AGE.

 

data have;
   input Name :$upcase4. Age Net Reg Loss;
   cards;
home 300 200 50     .
home 100 80   .     5
TDR  80  75   .     3
TDR  70   .   5     .
;;;;
   run;
proc print;
   run;
data want;
   if 0 then set have;
   update have(keep=name obs=0) have(drop=age);
   by name;
   if first.name then age=.;
   set have(keep=age rename=(age=_age));
   select(name);
      when('HOME') age = min(age,_age);
      when('TDR')  age = max(age,_age);
      end;
   drop _age;
   run;
proc print;
   run;

Capture.PNG

View solution in original post

11 REPLIES 11
LinusH
Tourmaline | Level 20

Is this the only case you have, duplicates on two rows, and all non missing values are the same?

I would try to use SQL with max() for this specific scenario.

Data never sleeps
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, not typing that in, so this is just examples (note: post test data in the form of a datastep for better answers):

Sort and merge each variable with column1

Use retain statements in one datatep and only output at last.column1

Transpose the data and sort nodupkey.

jackmelbourne
Fluorite | Level 6

Sorry, maybe I could make my question a bit clearer with this dataset?

 

For example:

 

dataset: housing

 

Name       Age      Net     Reg     Loss

home        300        200     50        . 

home        100        80        .          5

TDR          80          75        .         3

TDR          70           .          5        .

 

I would like to transform the housing data set to look like this

 

Name       Age      Net     Reg     Loss

home        100        80      50         5

TDR          80          75       5         3

 

What code would you recommend?

 

 

data_null__
Jade | Level 19

@jackmelbourne wrote:

Sorry, maybe I could make my question a bit clearer with this dataset?

 

For example:

 

dataset: housing

 

Name       Age      Net     Reg     Loss

home        300        200     50        . 

home        100        80        .          5

TDR          80          75        .         3

TDR          70           .          5        .

 

I would like to transform the housing data set to look like this

 

Name       Age      Net     Reg     Loss

home        100        80      50         5

TDR          80          75       5         3

 

What code would you recommend?

 

 


Why does age 100 update 300 for NAME=HOME while age 70 does not update 80 for NAME=TDR?

jackmelbourne
Fluorite | Level 6

Hi,

 

Sorry, should have made that clearer.

 

I need the lower value for home and the higher value for TDR when there are values for both observations.

 

Thanks,

Jack

data_null__
Jade | Level 19

@jackmelbourne wrote:

Hi,

 

Sorry, should have made that clearer.

 

I need the lower value for home and the higher value for TDR when there are values for both observations.

 

Thanks,

Jack


And the other variables update to last non-missing?

jackmelbourne
Fluorite | Level 6

Yes, exactly.

data_null__
Jade | Level 19

@jackmelbourne wrote:

Yes, exactly.


Updating to last non-missing is standard UPDATE statement.  The conditional min/max makes for a bit of extra code for AGE.

 

data have;
   input Name :$upcase4. Age Net Reg Loss;
   cards;
home 300 200 50     .
home 100 80   .     5
TDR  80  75   .     3
TDR  70   .   5     .
;;;;
   run;
proc print;
   run;
data want;
   if 0 then set have;
   update have(keep=name obs=0) have(drop=age);
   by name;
   if first.name then age=.;
   set have(keep=age rename=(age=_age));
   select(name);
      when('HOME') age = min(age,_age);
      when('TDR')  age = max(age,_age);
      end;
   drop _age;
   run;
proc print;
   run;

Capture.PNG

jackmelbourne
Fluorite | Level 6

Perfect, thank you so much! Seems to work 🙂

user24feb
Barite | Level 11

(simplistic 🙂 ) solution:

Data A;
  Format Name $8. Value 8. BodyPart $8. UnknownVar $12. Value2 8.;
  Input Name $ Value BodyPart $ UnknownVar $ Value2; 
  Datalines;
Jack 500 head . 23
Jack . head patil 23
Tim 100 shoulder . 24
Tim 100 . Dufwenberg .
;
* ^^^ check if this is similar to your data here ..;

Data B (Keep=Dummy:
		Rename=(DummyName=Name 
				DummyValue=Value
				DummyBodyPart=BodyPart
				DummyUnknownVar=UnknownVar
				DummyValue2=Value2)); 
  Set A;
  Format DummyName $8. DummyValue 8. DummyBodyPart $8. DummyUnknownVar $12. DummyValue2 8.;
  Retain DummyName DummyValue DummyBodyPart DummyUnknownVar DummyValue2;
  By /*Notsorted*/ Name;
  If not Missing (Name) Then DummyName=Name; * hm, different var types are a bit awkward ..;
  If not Missing (Value) Then DummyValue=Value; * .. could try array otherwise ..;
  If not Missing (BodyPart) Then DummyBodyPart=BodyPart; * .. or maybe call execute;
  If not Missing (UnknownVar) Then DummyUnknownVar=UnknownVar;
  If not Missing (Value2) Then DummyValue2=Value2;
  If Last.Name Then Output;
Run;
rogerjdeangelis
Barite | Level 11
Keep the last non-missing value for each variable

I think there is an error in you 'want' table
TDR(AGE)=70 not 80


HAVE
====

Up to 40 obs WORK.HAVE total obs=4

Obs    NAME   AGE  NET  REG  LOSS

 1     home   300  200   50     .
 2     home   100   80    .     5
 3     TDR     80   75    .     3
 4     TDR     70    .    5     .

WANT
====

Obs    NAME   AGE  NET  REG  LOSS

 1     home   100   80   50     5
 2     TDR     70   75    5     3

SOLUTION

* I set this up to work with an arbitrary number of numeric columns;
* Do not need disubl if you hardcode variable names;
* I wish SAS would make the number of coluns available at compilation time;
data have;
input Name $ Age Net Reg Loss;
cards4;
home 300 200 50 .
home 100 80 . 5
TDR 80 75 . 3
TDR 70 . 5 .
;;;;
run;quit;

data _null_;
    set have(obs=1);
    array nums _numeric_;
    call symputx('nums',put(dim(nums),5.));
    rc=dosubl('
      data want;
        set have;
        by name notsorted;
        array nums _numeric_;
        array savs[&nums.] _temporary_;
        do i=1 to dim(nums);
          if nums[i] ne . then savs[i]=nums[i];
        end;
        do i=1 to dim(nums);
          nums[i]=savs[i];
        end;
        if last.name then output;
      run;quit;
    ');
run;quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 7077 views
  • 1 like
  • 6 in conversation