Hi:
When SAS creates WORK.NEW, it needs to build something called the "descriptor portion" for the NEW dataset. Therefore, at compile time (of the code) SAS scans your program and builds this descriptor portion using a buffer area to hold the column names and values for each observation. The column names come from any references, assignment statements or SET/MERGE datasets in your entire program. At compile time, the buffer area just holds the column names. Then at execution time, the buffer area holds the observation information being read in and at the end of the data step SAS writes the "keep" variables to the OUTPUT dataset (in this case, WORK.NEW).
So, by scanning through your program (from top to bottom) SAS builds a buffer area that looks "sort of" like this:
[pre]
value1 value2 value3 i org_id value _n_ _error_
[/pre]
The first references to variables that SAS finds are the variables in the ARRAY statement (the reference in the KEEP statement isn't used until SAS has a complete list of ALL the variables). Even though the KEEP statement -appears- first in your code, it is not -used- first to make the list of variables in the buffer area (technically called the "Program Data Vector" or PDV).
VALUE1, VALUE2, VALUE3 are put in the buffer from the ARRAY statement. Next, the DO loop is encountered and I is added to the buffer. Then the SET statement is encountered and Org_ID and VALUE are added in whatever order they appear in WORK.TEMP, Then, _N_ and _ERROR_ are added to the buffer area -- they are internal variables that SAS always uses for DATA step programs.
Next, the Drop or Keep statements or options are applied to this complete list of variables:
[pre]
value1 value2 value3 i org_id value _n_ _error_
K K K D K D D D
[/pre]
K means Keep and D means Drop. SAS has to have the whole column list before it can keep track of the DROP/KEEPs for the variables. So now, this is the order that the variables have been created: Value1, Value2, Value3, Org_ID. That's why when you do a default PROC PRINT, you get the variables in this order. You can control that with a VAR statement inside the PROC PRINT.
Your program assumes that there will ALWAYS be 3 observations for every Org_ID -- if an Org_ID only has 2 observations, then your SET statement will start to become "off" and continue to be "off" for the rest of the data. I rarely code a SET statement inside a DO loop of this form -- there are times to use the technique, but this would not be the technique I'd use for this problem.
This could be a more robust program if you investigated some other ways to process the data...such as using the SET statement outside the DO loop or using PROC TRANSPOSE.
Both of those methods are illustrated below with some new data that contains one Org_ID (1105) that only has 2 observations and another Org_ID that has only 1 observation (1106) using BY group processing -- either with the DATA step program or with PROC TRANSPOSE will ensure that every Org_ID by group stays together for processing.
cynthia
[pre]
data work.temp;
infile datalines;
input Org_ID $ Value;
return;
Datalines;
1000 0.22
1000 0.12
1000 0.11
1100 0.14
1100 0.34
1100 0.21
1105 0.33
1105 0.44
1106 0.66
1200 0.11
1200 0.22
1200 0.33
;
run;
ods listing;
Data work.new2 (keep = Org_ID Value1-Value3 numval);
** read work.temp and use by group processing;
set work.temp;
by Org_ID;
** retain value1, value2, value3;
** and declare them in an ARRAY statement;
retain value1 value2 value3 i;
Array Values {3} Value1-Value3;
** for every "new" Org_ID, initialize values to missing;
** reset i to 0;
if first.Org_ID then do;
value1=.; value2=.; value3=.;
i = 0;
end;
** increment i for every observation;
i + 1;
** assign the array member a value, based on I;
VALUES{I} = VALUE;
** by the time the last Org_ID is read, the array is full;
** create a var to hold the number of values and output;
** the new observation.;
if last.Org_ID then do;
numval = i;
output;
end;
RUN;
proc print data=work.new2;
title '1) with array and set outside do loop';
run;
proc transpose data=temp out=new_trans;
by Org_ID;
var value;
run;
proc print data=new_trans;
title '2) with transpose';
run;
[/pre]