Help using Base SAS procedures

PROC SQL: Making one record out of many.

Reply
N/A
Posts: 0

PROC SQL: Making one record out of many.

My table contains 8 variables: ID, VAR1-VAR7.
VAR5 is a character variable and the others are numerical.
Each ID have several records. I want to have only one record for each ID in the following way:

VAR1 = MIN(VAR1)
MIN(VAR3) should be assigned to a new variable, lets say AVAR3=MIN(VAR3).

If VAR4=1 or VAR4=2 in ANY of ID:s records, then I should find the MAX(VAR3) among those. That value should be assigned to a new variable, lets say BVAR3=MAX(VAR3) and VAR2, VAR4, VAR6 and VAR7 should be kept from the same record.

If VAR4~=1 and VAR4~=2 in ALL of ID:s records, then I should find the MAX(VAR2) among those. The assignment to BVAR3 then should be BVAR3=VAR3 from that record and VAR2, VAR4, VAR6 and VAR7 should be kept from the same record.

If ANY of ID:s VAR5='A' or 'B' then I would like to assign the value 1 to the new numerical variable AVAR5, otherwise AVAR5 should have the value 0.

Thanks for any help, with this rather complicated problem.
Frequent Contributor
Posts: 102

Re: PROC SQL: Making one record out of many.

Without sample data I couldn't write test this, so consider it pseudo code.

proc sort data = mytable;
by ID;
run;
proc sql;
create table Var3ORVar2 as
select ID,MAX(VAR4 = 1 or VAR4 = 2) as VAR4_1OR2,
max(var3) as BVAR3,
max(var2) as MaxVar2,
(VAR5 in ('A','B') as AVAR5
from mytable
group by ID;
quit;
data results(keep = ID BVAR3 AVAR5 VAR2 VAR4 VAR6 VAR7);
merge mytable Var3ORVar2;
by ID;
if VAR4_1OR2 then do;
if var3 = MaxVar3 then output;
end;
else do;
BVAR3 = var3;
if var2 = MaxVar2 then output;
end;
run;

It could also be done in a single pass through the data using arrays or hashes, but I think the two step approach is best in this case.

Curtis
N/A
Posts: 0

Re: PROC SQL: Making one record out of many.

I don't understand why you have MAX(VAR4=1 or VAR4=2).

VAR4 is a numerical variabel that can have any integer value. Depending on if VAR4 is in {1,2} in any of ID:s records or not, different actions should be taken.

If VAR4 had been a character variable with possible values {q,w,e,r,t,y} and if depending on if VAR4 had been in {q,w} or not, you should have made different selections, how would you then replace MAX? The answer to that question could possibly clarify your use of MAX(VAR4=1 or VAR4=2).

The VAR1=MIN(VAR1) is missing. Perhaps i should introduce a new variable AVAR1=MIN(VAR1).

I would also like to sum the variable VAR7 from records where VAR3 le BVAR3; BVAR3 assigned depending on the value of VAR4, as described in my first message.

Thanks for your help.
N/A
Posts: 0

Re: PROC SQL: Making one record out of many.

Due to the complexity of your conditions, I would recommend you NOT use proc sql, but use a data step.

proc sort data=table; by id;
data condensed;
set table;
by id;

retain min_var1 min_var3 max_var3 flag rem2a rem4a rem6a rem7a max_var2 rem3 rem2b rem4b rem6b rem7b avar5;

if first.id then do;
min_var1 = var1;
min_var3 = var3;
max_var3 = var3;
flag = 0;
rem2 = var2;
rem4 = var4;
rem6 = var6;
rem7 = var7;
max_var2 = var2;
rem3 = var3;
avar5 = 0;
end;

if var1 < min_var1 then min_var1 = var1;
if var3 < min_var3 then min_var3 = var3;

if var4 = 1 or var4 = 2 then do;
flag=1;
if var3 > max_var3 then max_var3 = var3;
rem2a = var2;
rem4a = var4;
rem6a = var6;
rem7a = var7;
end;

if var2 > max_var2 then do;
rem3 = var3;
rem2b = var2;
rem4b = var4;
rem6b = var6;
rem7b = var7;
end;

if var5 = 'A' or var5 = 'B' then avar5 = 1;

if last.id then do;
var1 = min_var1;
avar3 = min_var3;
if flag = 1 then do;
bvar3 = max_var3;
var2 = rem2a;
var4 = rem4a;
var6 = rem6a;
var7 = rem7a;
output;
end;
if flag = 0 then do;
bvar3 = rem3;
var2 = rem2b;
var4 = rem4b;
var6 = rem6b;
var7 = rem7b;
output;
end;
end;
drop min_var1 min_var3 max_var3 rem2a rem4a rem6a rem7a max_var2 rem3 rem2b rem4b rem6b rem7b;
run;

The beauty of SAS is that you have so many ways to work with your data. You have the choice of using SQL when it is appropriate -- for joins, selects, database access, etc. -- and data steps when it is appropriate -- reading files, parsing, complicated data selections and summaries like this case, data groupings difficult to implement in SQL, etc.

"Use the right tool for the job" it's easier, quicker and the results are soooooo much better.
Ask a Question
Discussion stats
  • 3 replies
  • 128 views
  • 0 likes
  • 2 in conversation