BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jlopez1
Fluorite | Level 6

Hello,

 

I have a situation in which I have a data set that contains respondent ID numbers in a single column and corresponding respondent ages in multiple columns. I need to match each individual respondent ID (PID) with its corresponding respondent age (a1-a5). A variable named TY is a binary variable that is coded as "1" or "2." TY as "1" is primary row that contains ages a1-a5. When "TY" is coded as "2" there are no row value for age. As you can see there are more ages than PIDs. I am working on a large dataset and seek a clean and quick method of matching each PID to each age. Any suggestions? Thanks. Here's how the data looks like:

 

data one;
input id$ PID$ TY a1 a2 a3 a4 a5;
cards;
CT0121   CT0121A1     1  23 35 45 65 76
CT0121   CT0121A2     2  . . . . .
CT0121   CT0121A3     2  . . . . .
CT0121   CT0121A4     2  . . . . .
CT0200   CT0200A1     1  34 36 38 40 42
CT0200   CT0200A2     2  . . . . .
NS10500 NS10500A1   1  21 23 24 25 90
NS10500 NS10500A2   2  . . . . .
NS10500 NS10500A3   2  . . . . .
;
run; quit;

1 ACCEPTED SOLUTION

Accepted Solutions
8 REPLIES 8
ballardw
Super User

And what would the data look like afterward?

jlopez1
Fluorite | Level 6

From this data structure:

 

CT0121   CT0121A1     1  23 35 45 65 76
CT0121   CT0121A2     2  . . . . .
CT0121   CT0121A3     2  . . . . .
CT0121   CT0121A4     2  . . . . .
CT0200   CT0200A1     1  34 36 38 40 42
CT0200   CT0200A2     2  . . . . .
NS10500 NS10500A1   1  21 23 24 25 90
NS10500 NS10500A2   2  . . . . .
NS10500 NS10500A3   2  . . . . .

 

To here's how it needs to look like:

 

ID             PID                Age

 

CT0121   CT0121A1     23
CT0121   CT0121A2     35
CT0121   CT0121A3     45
CT0121   CT0121A4     65
CT0200   CT0200A1     34
CT0200   CT0200A2     36
NS10500 NS10500A1   21
NS10500 NS10500A2   23
NS10500 NS10500A3   24

 

Thanks for the reply!

PGStats
Opal | Level 21

A simple guess..

 

data two;
set one;
array a{5};
do i = 1 to dim(a);
    if not missing(a{i}) then do;
        age = a{i};
        output;
        end;
    end;
keep id pid age;
run;
PG
jlopez1
Fluorite | Level 6

Still not resolved. PIDs are changed to single value when I began with individual values. Please refer to last post of what data should look like. Thanks.

 

 

Obs   id           PID              age

1       CT0121 CT0121A1   23
2       CT0121 CT0121A1   35
3       CT0121 CT0121A1   45
4       CT0121 CT0121A1   65
5       CT0121 CT0121A1   76
6       CT0200 CT0200A1   34
7       CT0200 CT0200A1   36
8       CT0200 CT0200A1   38
9       CT0200 CT0200A1   40
10     CT0200 CT0200A1   42
11     NS10500 NS10500A 21
12     NS10500 NS10500A 23
13     NS10500 NS10500A 24
14     NS10500 NS10500A 25
15     NS10500 NS10500A 90

 

PGStats
Opal | Level 21

Less of a guess this time:

 

data one;
input id :$10. PID :$12. TY a1 a2 a3 a4 a5;
cards;
CT0121   CT0121A1     1  23 35 45 65 76
CT0121   CT0121A2     2  . . . . .
CT0121   CT0121A3     2  . . . . .
CT0121   CT0121A4     2  . . . . .
CT0200   CT0200A1     1  34 36 38 40 42
CT0200   CT0200A2     2  . . . . .
NS10500 NS10500A1   1  21 23 24 25 90
NS10500 NS10500A2   2  . . . . .
NS10500 NS10500A3   2  . . . . .
;

data two;
set one;
array a{5};
array v{5} _temporary_;
if ty = 1 then
    do i = 1 to dim(a);
        v{i} = a{i};
        end;
i = input(scan(pid, -1, "A"), 2.);
if i > 0 and i <= dim(v) then do;
    age = v{i};
    output;
    end;
keep id pid age;
run;
PG
jlopez1
Fluorite | Level 6

HI!   Thanks but I don't get all the observations processed in the array you sent out.

 

I run this code:

 

data two;
set one;
array a{5};
array v{5} _temporary_;
if ty = 1 then
do i = 1 to dim(a);
v{i} = a{i};
end;
i = input(scan(pid, -1, "A"), 2.);
if i > 0 and i <= dim(v) then do;
age = v{i};
output;
end;
keep id pid age;
run;

proc print data=two; var id pid age; run;

 

This is the ouput I get:

 

The SAS System 08:40 Monday, April 4, 2016 30

Obs id PID age

1 CT0121 CT0121A1 23
2 CT0121 CT0121A2 35
3 CT0121 CT0121A3 45
4 CT0121 CT0121A4 65
5 CT0200 CT0200A1 34
6 CT0200 CT0200A2 36

 

And I get the following from the log output:

 

 


383
384 data two;
385 set one;
386 array a{5};
387 array v{5} _temporary_;
388 if ty = 1 then
389 do i = 1 to dim(a);
390 v{i} = a{i};
391 end;
392 i = input(scan(pid, -1, "A"), 2.);
393 if i > 0 and i <= dim(v) then do;
394 age = v{i};
395 output;
396 end;
397 keep id pid age;
398 run;

NOTE: Invalid argument to function INPUT at line 392 column 5.
id=NS10500 PID=NS10500A TY=1 a1=21 a2=23 a3=24 a4=25 a5=90 i=. age=. _ERROR_=1 _N_=7
NOTE: Invalid argument to function INPUT at line 392 column 5.
id=NS10500 PID=NS10500A TY=2 a1=. a2=. a3=. a4=. a5=. i=. age=. _ERROR_=1 _N_=8
NOTE: Invalid argument to function INPUT at line 392 column 5.
id=NS10500 PID=NS10500A TY=2 a1=. a2=. a3=. a4=. a5=. i=. age=. _ERROR_=1 _N_=9
NOTE: Mathematical operations could not be performed at the following places. The results of the
operations have been set to missing values.
Each place is given by: (Number of times) at (Line):(Column).
3 at 392:5
NOTE: There were 9 observations read from the data set WORK.ONE.
NOTE: The data set WORK.TWO has 6 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds


399
400 proc print data=two; var id pid age; run;

NOTE: There were 6 observations read from the data set WORK.TWO.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

 

I don't understand the error message about the INPUT Statement. Any suggestions? For some reason the last observations with a "NS" prefix are not included due to the error in the PUT statment.

 

Thanks a lot!

PGStats
Opal | Level 21

The problem is with the way you read the data in. Variables id and pid are too short to contain the longer codes. My program uses the last part if the pid code (after the "A") to decide which array element to output. When that part of the pid code is missing, an error occurs. 

Make sure you read the data in properly, as I did in the example ( input id :$10. PID :$12. ), and the rest of the program will work.

PG
jlopez1
Fluorite | Level 6

Awesome! Will do!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1220 views
  • 1 like
  • 3 in conversation