Solved
Contributor
Posts: 24

# Data management

I have about 20 columns of data say X1 through X20 and each column has millions of rows. The data in each cell (column and row) are in character codes like '360' meaning certain kind of disease, similarly '361' etc. etc. I want to see if there is say '360' in column X1 in 1st row and '361' in any of the remaining X2 or X3 or upto X20 columns on the same 1st row then count=1 if not the count=0. Similarly say if the same '360' in 2nd row of column X1 and '361' in any of the other remaining 19 columns on the same 2nd row then count=1 if not then count=0 and so on. Finally, I want to sum all the counts that are 1.

I would appreciate very much, if any kind soul out there would be able to help me in writing SAS codes to solve my problem.

Thank you very much in advance.

Accepted Solutions
Solution
‎03-16-2017 09:17 PM
PROC Star
Posts: 2,344

## Re: Data management

Note that you can write the data step as:

``````data WANT;
set HAVE ;
N= ( X1='360' & whichc('361',of X2-X20) );
run;``````

All Replies
PROC Star
Posts: 2,344

## Re: Data management

Like this?

``````data HAVE;
length X1-X20 \$3;
input X1 X5;
cards;
360 361
360 400
400 401
run;

data WANT;
set HAVE end=LASTOBS;
if X1='360' and whichc('361',of X2-X20) then N+1;
if LASTOBS then putlog N=;
run;

``````

N=1

Contributor
Posts: 24

## Re: Data management

Thank you so much ChrisNZ. Your suggestion almost works. I think, I did not explain too well. Let me describe just like you did:

HAVE (all in digital characters of length 3)

X1 X2 X3 X4
360 361 225 390
360 400 332 423
400 401 360 361

360 210 562 361

225 329 250 662

WANT (added C1 as number either 1 or 0)

X1   X2   X3   X4     C1

360 361 225 390     1   (because '360' is in 1st column and '361' in second column of the same 1st row)
360 400 332 423     0   (because even '360' is in 1st column there is no '361' in any other column on the 2nd row)
400 401 360 361     0   (because '360' is not in the 1st column, even though they are in 3rd and 4th columns)

360 210 562 361     1   (becasuse '360' is in the 1st column and '361' in the 4th column)

225 329 250 662     0   (because neither '360' is in the 1st column nor '361' in any other columns)

In other words, '360' has to be in the first column and '361' can be in any other column in the same row to count 1, otherwise count is 0.

When I ran the codes you suggested it counts 1 for every row entry like this:

 Obs X1 …. X5 …. X20 N 1 360 361 1 2 360 400 1 3 400 401 1

Any suggestions?

Contributor
Posts: 24

## Re: Data management

Thank you again ChrisNZ. I did a small modification and it worked perfectly. Following is my modification:

data HAVE;

length X1-X20 \$3;

input X1 X2 X3 X4;

cards;

360 361 290 332

360 400 401 375

400 401 405 229

225 120 228 360

112 227 865 342

360 345 443 361

360 231 361 870

run;

data WANT;

set HAVE end=LASTOBS;

if X1='360' and whichc('361',of X2-X20) then N=1;

else N=0;

if LASTOBS then putlog N=;

run;

Quit;

Solution
‎03-16-2017 09:17 PM
PROC Star
Posts: 2,344

## Re: Data management

Note that you can write the data step as:

``````data WANT;
set HAVE ;
N= ( X1='360' & whichc('361',of X2-X20) );
run;``````
Contributor
Posts: 24

## Re: Data management

Chris,

Sorry to bother you. I got another similar problem which you helped in March. May be this you can help me for this one also. The problom is:

data HAVE;

length X1-X20 \$3;

input X1 X2 X3 X4; /*All input values are charecters, and I have over 300k rows with 20 columns, there are no data after certain column in each row*/

cards;

360 361 290  332

360

400 401 405

225

112 227 865           342

360       443 361

360              870

360

360

run;

I WANT:

if 360 in first column first row, and there are no other input in any other column in the same row then count is one else blank. So in the above data, for 360 I want count=1 from second row and count=1 each from last two rows. Similarly for 225 I should get count=1 only from 4th row, etc.

I would appreciate very for your kindly help.

Thanks.

-Sijan Sapkota

PROC Star
Posts: 2,344

## Re: Data management

Like this?

``````N= ifn( X1 ne ' ' & coalesce(of X2-X20)=' ' , 1, .);
``````
Contributor
Posts: 67

## Re: Data management

Hey ! Someone here could help me please ! I have 2 datasets like below.

Table1 has more variables than table2, both have commun variables.

But i'm focusting on id, vch and mt commun variables;

data table1;

input (id vch) (\$)  mt ;

cards;

civ     a     30

sen   b     20

sen   a     10

civ     c     30

buk   c     15

;run;

data table2;

input  (id vch) (\$)  mt;

cards;

civ    a   20

sen   b  15

civ    c   40

;run;

This is the resulting table i'm seeking for. And i try explainations below.

civ     a     10     [= 30-20]

sen    b     05    [= 20-15]

sen    a    10     [because there is    no combination "sen   a"  in table2]

civ     c     00     [because 40>30]

buk    c     05     [= 15-(40-30)]

I'm concatening the 2 tables.

For an observation for which this 2 variables (id, vch) are the same:

First:

I keep only one observation.

Second:

For this only and remaining obs,

if  table1.var3 >= table2.var3   then  var3 =  sum(table1.var3, - table2.var3)

if  table1.var3 < table2.var3     then  var3 = 0 and sum(- table1.var3, table2.var3)

will be allocated to another obs which as the same vch value. In this case, this other obs always exist.

Thanks a lot !

PROC Star
Posts: 2,344

Contributor
Posts: 67

## Re: Data management

True. New by here, did not know exactly the rules before.

Thanks

Regards

☑ This topic is solved.

Discussion stats
• 9 replies
• 312 views
• 1 like
• 3 in conversation