DATA Step, Macro, Functions and more

Matching Respondent ID with Respondent Age

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

Matching Respondent ID with Respondent Age

Hello,

 

I have a situation in which I have a data set that contains respondent ID numbers in a single column and corresponding respondent ages in multiple columns. I need to match each individual respondent ID (PID) with its corresponding respondent age (a1-a5). A variable named TY is a binary variable that is coded as "1" or "2." TY as "1" is primary row that contains ages a1-a5. When "TY" is coded as "2" there are no row value for age. As you can see there are more ages than PIDs. I am working on a large dataset and seek a clean and quick method of matching each PID to each age. Any suggestions? Thanks. Here's how the data looks like:

 

data one;
input id$ PID$ TY a1 a2 a3 a4 a5;
cards;
CT0121   CT0121A1     1  23 35 45 65 76
CT0121   CT0121A2     2  . . . . .
CT0121   CT0121A3     2  . . . . .
CT0121   CT0121A4     2  . . . . .
CT0200   CT0200A1     1  34 36 38 40 42
CT0200   CT0200A2     2  . . . . .
NS10500 NS10500A1   1  21 23 24 25 90
NS10500 NS10500A2   2  . . . . .
NS10500 NS10500A3   2  . . . . .
;
run; quit;


Accepted Solutions
Solution
‎04-04-2016 09:24 PM
Occasional Contributor
Posts: 5

Re: Matching Respondent ID with Respondent Age

Awesome! Will do!

View solution in original post


All Replies
Super User
Posts: 10,550

Re: Matching Respondent ID with Respondent Age

And what would the data look like afterward?

Occasional Contributor
Posts: 5

Re: Matching Respondent ID with Respondent Age

From this data structure:

 

CT0121   CT0121A1     1  23 35 45 65 76
CT0121   CT0121A2     2  . . . . .
CT0121   CT0121A3     2  . . . . .
CT0121   CT0121A4     2  . . . . .
CT0200   CT0200A1     1  34 36 38 40 42
CT0200   CT0200A2     2  . . . . .
NS10500 NS10500A1   1  21 23 24 25 90
NS10500 NS10500A2   2  . . . . .
NS10500 NS10500A3   2  . . . . .

 

To here's how it needs to look like:

 

ID             PID                Age

 

CT0121   CT0121A1     23
CT0121   CT0121A2     35
CT0121   CT0121A3     45
CT0121   CT0121A4     65
CT0200   CT0200A1     34
CT0200   CT0200A2     36
NS10500 NS10500A1   21
NS10500 NS10500A2   23
NS10500 NS10500A3   24

 

Thanks for the reply!

Respected Advisor
Posts: 4,662

Re: Matching Respondent ID with Respondent Age

A simple guess..

 

data two;
set one;
array a{5};
do i = 1 to dim(a);
    if not missing(a{i}) then do;
        age = a{i};
        output;
        end;
    end;
keep id pid age;
run;
PG
Occasional Contributor
Posts: 5

Re: Matching Respondent ID with Respondent Age

Still not resolved. PIDs are changed to single value when I began with individual values. Please refer to last post of what data should look like. Thanks.

 

 

Obs   id           PID              age

1       CT0121 CT0121A1   23
2       CT0121 CT0121A1   35
3       CT0121 CT0121A1   45
4       CT0121 CT0121A1   65
5       CT0121 CT0121A1   76
6       CT0200 CT0200A1   34
7       CT0200 CT0200A1   36
8       CT0200 CT0200A1   38
9       CT0200 CT0200A1   40
10     CT0200 CT0200A1   42
11     NS10500 NS10500A 21
12     NS10500 NS10500A 23
13     NS10500 NS10500A 24
14     NS10500 NS10500A 25
15     NS10500 NS10500A 90

 

Respected Advisor
Posts: 4,662

Re: Matching Respondent ID with Respondent Age

Less of a guess this time:

 

data one;
input id :$10. PID :$12. TY a1 a2 a3 a4 a5;
cards;
CT0121   CT0121A1     1  23 35 45 65 76
CT0121   CT0121A2     2  . . . . .
CT0121   CT0121A3     2  . . . . .
CT0121   CT0121A4     2  . . . . .
CT0200   CT0200A1     1  34 36 38 40 42
CT0200   CT0200A2     2  . . . . .
NS10500 NS10500A1   1  21 23 24 25 90
NS10500 NS10500A2   2  . . . . .
NS10500 NS10500A3   2  . . . . .
;

data two;
set one;
array a{5};
array v{5} _temporary_;
if ty = 1 then
    do i = 1 to dim(a);
        v{i} = a{i};
        end;
i = input(scan(pid, -1, "A"), 2.);
if i > 0 and i <= dim(v) then do;
    age = v{i};
    output;
    end;
keep id pid age;
run;
PG
Occasional Contributor
Posts: 5

Re: Matching Respondent ID with Respondent Age

HI!   Thanks but I don't get all the observations processed in the array you sent out.

 

I run this code:

 

data two;
set one;
array a{5};
array v{5} _temporary_;
if ty = 1 then
do i = 1 to dim(a);
v{i} = a{i};
end;
i = input(scan(pid, -1, "A"), 2.);
if i > 0 and i <= dim(v) then do;
age = v{i};
output;
end;
keep id pid age;
run;

proc print data=two; var id pid age; run;

 

This is the ouput I get:

 

The SAS System 08:40 Monday, April 4, 2016 30

Obs id PID age

1 CT0121 CT0121A1 23
2 CT0121 CT0121A2 35
3 CT0121 CT0121A3 45
4 CT0121 CT0121A4 65
5 CT0200 CT0200A1 34
6 CT0200 CT0200A2 36

 

And I get the following from the log output:

 

 


383
384 data two;
385 set one;
386 array a{5};
387 array v{5} _temporary_;
388 if ty = 1 then
389 do i = 1 to dim(a);
390 v{i} = a{i};
391 end;
392 i = input(scan(pid, -1, "A"), 2.);
393 if i > 0 and i <= dim(v) then do;
394 age = v{i};
395 output;
396 end;
397 keep id pid age;
398 run;

NOTE: Invalid argument to function INPUT at line 392 column 5.
id=NS10500 PID=NS10500A TY=1 a1=21 a2=23 a3=24 a4=25 a5=90 i=. age=. _ERROR_=1 _N_=7
NOTE: Invalid argument to function INPUT at line 392 column 5.
id=NS10500 PID=NS10500A TY=2 a1=. a2=. a3=. a4=. a5=. i=. age=. _ERROR_=1 _N_=8
NOTE: Invalid argument to function INPUT at line 392 column 5.
id=NS10500 PID=NS10500A TY=2 a1=. a2=. a3=. a4=. a5=. i=. age=. _ERROR_=1 _N_=9
NOTE: Mathematical operations could not be performed at the following places. The results of the
operations have been set to missing values.
Each place is given by: (Number of times) at (Line)Smiley SadColumn).
3 at 392:5
NOTE: There were 9 observations read from the data set WORK.ONE.
NOTE: The data set WORK.TWO has 6 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds


399
400 proc print data=two; var id pid age; run;

NOTE: There were 6 observations read from the data set WORK.TWO.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

 

I don't understand the error message about the INPUT Statement. Any suggestions? For some reason the last observations with a "NS" prefix are not included due to the error in the PUT statment.

 

Thanks a lot!

Respected Advisor
Posts: 4,662

Re: Matching Respondent ID with Respondent Age

The problem is with the way you read the data in. Variables id and pid are too short to contain the longer codes. My program uses the last part if the pid code (after the "A") to decide which array element to output. When that part of the pid code is missing, an error occurs. 

Make sure you read the data in properly, as I did in the example ( input id :$10. PID :$12. ), and the rest of the program will work.

PG
Solution
‎04-04-2016 09:24 PM
Occasional Contributor
Posts: 5

Re: Matching Respondent ID with Respondent Age

Awesome! Will do!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 336 views
  • 1 like
  • 3 in conversation