Solved: SET data sets and retain values

Ronein · Posted 12-27-2021 03:02 PM

Hello

What is the way to using data sets t1,t2,t3 in order to get the wanted data set ?

As you can see the desire action is set of data sets but need also to fill up empty fields by the values above

Data t1;
input ID team group Var $ value;
1 555 8 X1 15
1 555 8 X2 30
1 555 8 X3 40
1 555 8 X4 10
1 555 8 X5 28
2 444 7 X1 48
2 444 7 X2 52
2 444 7 X3 20
2 444 7 X4 15
2 444 7 X5 31
;
Run;

Data t2;
input ID Z ;
cards;
1 100
2 200
;
Run;


Data t3;
input ID W ;
cards;
1 50
2 80
;
Run;

/*Wanted data set*/
/*1 555 8 X1 15*/
/*1 555 8 X2 30*/
/*1 555 8 X3 40*/
/*1 555 8 X4 10*/
/*1 555 8 X5 28*/
/*1 555 8 Z 100*/
/*1 555 8 W 50*/
/*2 444 7 X1 48*/
/*2 444 7 X2 52*/
/*2 444 7 X3 20*/
/*2 444 7 X4 15*/
/*2 444 7 X5 31*/
/*2 444 7 Z 200*/
/*2 444 7 W 801*/

Astounding · Posted 12-27-2021 03:43 PM

There will be several ways. Here's a way that uses a single DATA step:

data want;
   set t1;
   by id;
   output;
   if last.id;
   group_no + 1;
   set t2 point=group_no;
    var = "Z";
    value = z;
    output;
   set t3 point=group_no;
   var = "W";
   value = w;
   output;
   drop group_no w z;
run;

If ID really does take on sequential values 1, 2, 3, etc., the program could be simplified. Get rid of GROUP_NO and use ID as the POINT= variable.

View solution in original post

Tom · Posted 12-27-2021 03:20 PM

You should make sure the datasets are using the same structure before using the SET statement to combine them.

So you should have VAR values of Z and W and that actual data in VALUE.

Also Your Z and W datasets only have the ID variable.

Let's assume the other variables are just constant attributes of the ID value.

Data t1;
  input ID team group Var $ value;
cards;
1 555 8 X1 15
1 555 8 X2 30
1 555 8 X3 40
1 555 8 X4 10
1 555 8 X5 28
2 444 7 X1 48
2 444 7 X2 52
2 444 7 X3 20
2 444 7 X4 15
2 444 7 X5 31
;

Data t2;
  input ID Z ;
cards;
1 100
2 200
;

Data t3;
  input ID W ;
cards;
1 50
2 80
;

proc sort data=t1(keep=id team group) out=ids nodupkey;
  by id;
run;

data z ;
  merge ids t2;
  by id;
  length var $8 value 8;
  var='Z';
  value=z;
  drop z;
run;

data w ;
  merge ids t3;
  by id;
  length var $8 value 8;
  var='W';
  value=w;
  drop w;
run;

data want;
  set t1 w z;
  by id;
run;

proc print;
run;

Results

Obs    ID    team    group    Var    value

  1     1     555      8      X1       15
  2     1     555      8      X2       30
  3     1     555      8      X3       40
  4     1     555      8      X4       10
  5     1     555      8      X5       28
  6     1     555      8      W        50
  7     1     555      8      Z       100
  8     2     444      7      X1       48
  9     2     444      7      X2       52
 10     2     444      7      X3       20
 11     2     444      7      X4       15
 12     2     444      7      X5       31
 13     2     444      7      W        80
 14     2     444      7      Z       200

Astounding · Posted 12-27-2021 03:43 PM

There will be several ways. Here's a way that uses a single DATA step:

data want;
   set t1;
   by id;
   output;
   if last.id;
   group_no + 1;
   set t2 point=group_no;
    var = "Z";
    value = z;
    output;
   set t3 point=group_no;
   var = "W";
   value = w;
   output;
   drop group_no w z;
run;

If ID really does take on sequential values 1, 2, 3, etc., the program could be simplified. Get rid of GROUP_NO and use ID as the POINT= variable.

Ronein · Posted 12-28-2021 01:09 AM

Thanks,

The ID in real data is not a sequence number.

May you please tell the terminology that you used in your code.

I would like to understand your code better .

Why do you use "output" statement 3 times?

Why do you use last statement one time?

Why do you use point statement two times?

Data t1;
  input ID team group Var $ value;
cards;
111 555 8 X1 15
111 555 8 X2 30
223 444 7 X3 20
223 444 7 X4 15
111 555 8 X3 40
111 555 8 X4 10
111 555 8 X5 28
223 444 7 X1 48
223 444 7 X2 52
223 444 7 X5 31
;
Run;

Data t2;
  input ID Z ;
cards;
111 100
223 200
;
Run;


Data t3;
  input ID W ;
cards;
111 50
223 80
;
Run;

proc sort data=t1;by ID;Run;
proc sort data=t2;by ID;Run;
proc sort data=t3;by ID;Run;

data want;
set t1;
by id;
output;
if last.id;
group_no + 1;
set t2 point=group_no;
var = "Z";
value = z;
output;
set t3 point=group_no;
var = "W";
value = w;
output;
drop group_no w z;
run;

Astounding · Posted 12-28-2021 02:10 PM

There are three OUTPUT statements, because there are three situations where I want to output an observation: after reading an observation from t1, after reading an observation from t2, and after reading an observation from t3.

There is only one BY statement, so there is only one set of first. and last. variables available. No need to refer to any other such variables, because they don't exist.

Using point= twice is overkill. In retrospect, none of them are needed. The program could be shortened as follows:

data want;
set t1;
by id;
output;
if last.id;
set t2;
var = "Z";
value = z;
output;
set t3;
var = "W";
value = w;
output;
drop w z;
run;

SET data sets and retain values

Re: SET data sets and retain values

Re: SET data sets and retain values

Re: SET data sets and retain values

Re: SET data sets and retain values

Re: SET data sets and retain values

SET data sets and retain values

Re: SET data sets and retain values

Re: SET data sets and retain values

Re: SET data sets and retain values

Re: SET data sets and retain values

Re: SET data sets and retain values

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away