## Sub-setting and Combining datasets

Solved
Occasional Contributor
Posts: 6

# Sub-setting and Combining datasets

Dear All,

I would really appreciate if someone can help me with the following issues:

1. I have a data set with a response variable and about 40 predictor variables. I am trying to extract the response as well as soem other predictor values for  which a specific predictor is missing. How do I code it?
2. I have two data sets each with response and predictor variables (number of predictors is not same in both data). If I want to combine the two data as below what do I do?

Data 1: y1, x11, x21, x31, z

Data 2: y2, x12,x22,x32,x42,z

PS: z stands for  predictor with same values for response in both dataset

Thanks

Accepted Solutions
Solution
‎11-01-2011 10:26 PM
Super User
Posts: 10,788

## Sub-setting and Combining datasets

If I understand what you mean correctly.

For first question. You can code like:

data want;

set have;

if missing(score);

run;

For second question.can code likeAssuming variables like x11 x21 x31 x41 ....        one by one increase 1)

```data one;
input y1 x11 x21 x31 z;
cards;
1 1 1 1 1
2 2 2 . 2
3 3 3 3 3
;
run;

data two;
input y2 x12 x22 x32 x42 z;
cards;
1 4 4 4 4 1
2 5 5 . 5 2
3 6 6 6 6 6
;
run;
*make z is the first variable;
data one;retain z;set one;run;
data two;retain z;set two;run;
*union two tables as the order of variables;
proc sql;
create table want as
select * from one
union all
select * from two;
quit;
```

Ksharp

All Replies
PROC Star
Posts: 8,169

## Sub-setting and Combining datasets

Your first question is easy to answer.  Your second gets more complex as one doesn't know the rules you want to follow for joining the two files.  Here is an example that MIGHT answer both questions:

data one;

input y1 x11 x21 x31 z;

cards;

1 1 1 1 1

2 2 2 . 2

3 3 3 3 3

;

data two;

input y2 x12 x22 x32 x42 z;

cards;

1 4 4 4 4 1

2 5 5 . 5 2

3 6 6 6 6 6

;

data want_one;

set one (where=(missing(x31)));

run;

proc sql;

create table want_two as

select a.*, b.x12,b.x22,b.x32,b.x42

from one as a

left join two as b

on a.y1=b.y2

;

quit;

Super User
Posts: 10,788

## Sub-setting and Combining datasets

Your post is ambiguous.

Which variable is response variable? Y1,Y2 or Z?

"extract the response as well as soem other predictor values for  which a specific predictor is missing"

what do you want? Only want those obs which has non-missing response variable?

"if I want to combine the two data as below what do I do?"

You can code like.

data want;

set data1 data2;

run;

If you think Y1 and Y2 is different response variable.

Otherwise, You need to rename these both variables to have the same variable name.

Ksharp

Occasional Contributor
Posts: 6

## Sub-setting and Combining datasets

Dear All,

Thank you for yuor responses to my query, they are much appreciated.

For further clarification:

In my first problem I basically want to extract the name of patients for ( variable name is lets say "name")  which a specific predictor is missing (lets say variable named "score").

In my second problem the study was conducted at two different time points at time point one patient response is y1 and predictor variables measured were

x11, x21, x31, z and at tiem point two the patient response is y2 and predictor variables measured were x12,x22,x32,x42,z. Please see z stands fro the predictor variables which doesnt change with time and the variables starting with x change with time.

I want to conduct a repeated measure analysis (my response is binary) by combining the two responses at 2 different time points  (y1 and y2) for each patient and hence the need for merging the two data.

Thank you

Solution
‎11-01-2011 10:26 PM
Super User
Posts: 10,788

## Sub-setting and Combining datasets

If I understand what you mean correctly.

For first question. You can code like:

data want;

set have;

if missing(score);

run;

For second question.can code likeAssuming variables like x11 x21 x31 x41 ....        one by one increase 1)

```data one;
input y1 x11 x21 x31 z;
cards;
1 1 1 1 1
2 2 2 . 2
3 3 3 3 3
;
run;

data two;
input y2 x12 x22 x32 x42 z;
cards;
1 4 4 4 4 1
2 5 5 . 5 2
3 6 6 6 6 6
;
run;
*make z is the first variable;
data one;retain z;set one;run;
data two;retain z;set two;run;
*union two tables as the order of variables;
proc sql;
create table want as
select * from one
union all
select * from two;
quit;
```

Ksharp

Occasional Contributor
Posts: 6

## Sub-setting and Combining datasets

Thanks a lot!

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
• 5 replies
• 171 views
• 0 likes
• 3 in conversation