Solved: Re: Data-driven assignment of initial value list of an array?

genemroz · Posted 08-02-2021 10:40 AM

Esteemed Advisers.

I suspect there is a fairly straightforward solution for this situation. But I've not encountered it before so I'm looking for your advice.

In the code below, in the datastep for generating dataset Qc_OUT, the goal is to populate variables station1 and station2 with all possible combinations of non-missing values of STA1 through STA10 as read from dataset RANDOM_METEORS. For each record of RANDOM_METEORS, I want to populate the array STATION[] with the non-missing values of STA1 through STA10. I currently have to identify by inspection the maximum value of STATIONCOUNT among all the records in order to hard code the dimension of array STATION and to specify the limits of the two loops thtat follow the ARRAY statement. I'm asking your help to find a way to use the value of STATIONCOUNT to properly form the ARRAY statement and the loops. I know I can substitute STATIONCOUNT for the array dimension and use it for loop control but I don't know how to code the initial-value list to encompass the next dimension of array STATION as the datastep moves to the next record.

BTW, the data step for initial data was created using the macro %data2datastep. I recently discovered this, and after figuring out how to use it (I thought the documentation might be a little weak for some), find it to be immensely helpful in preparing code in support of submitting questions to this forum.

Many thanks in advance for any advice you can provide,

Gene

data WORK.MERGED_MAXMIN_TXTYTZ;
  infile datalines dsd truncover;
  input concat:$70. _FREQ_:32. txmax:32. tymax:32. tzmax:32. txmin:32. tymin:32. tzmin:32. stationcount:32.;
datalines4;
US000S/US000V,79,30,-90,120,-200,-280,70,2
US000S/US000V/US001E,832,20,-110,120,-200,-270,70,3
US000S/US000V/US001E/US001Q,337,-10,-130,120,-200,-310,70,4
US000S/US000V/US001Q,5,-10,-280,120,-20,-300,80,3
US000S/US001E,37,-30,-120,120,-220,-350,70,2
US000S/US001E/US001Q,2115,-40,-120,120,-270,-360,70,3
US000S/US001Q,773,-210,-140,120,-390,-340,70,2
US000U/US000V,20,50,20,120,30,-40,90,2
US000V/US001E,514,40,-100,120,-170,-220,70,2
US000V/US001R,101,-110,0,120,-160,-50,70,2
US001E/US001Q,343,-90,-330,120,-270,-400,70,2
;;;;
run;
/* SAS macro from Wicklin for generating Random Numbers between a Min and Max */
%macro RandBetween(min, max);
   (&min + floor((1+&max-&min)*rand("uniform")))
%mend;
data Random_Meteors (drop= r _freq_ temp);
set MERGED_MAXMIN_TXTYTZ;
by concat;
call streaminit(123456);
if missing(concat) then delete;
gridpoints=_freq_;
if gridpoints=1 then delete;
meteorct=0;
do while (meteorct<1);
   TX_Start = %RandBetween(txmin, txmax);
   TY_Start = %RandBetween(tymin, tymax);
   TZ_Start = %RandBetween(90, 120);
   TX_End = %RandBetween(txmin, txmax);
   TY_End = %RandBetween(tymin, tymax);
   TZ_End = %RandBetween(70, 100);
  if tz_end > Tz_start then do;
	temp=tz_end;
	tz_end=tz_start;
	tz_start=temp;
	end;
	TrackLength=sqrt((TX_Start-TX_End)**2+(TY_Start-TY_End)**2+(TZ_Start-TZ_End)**2);
	r=divide((tz_start-tz_end),tracklength);
	Entryangle=arsin(r)*180/constant('pi');
   /* if tracklength<10 or EntryAngle<15, generate another meteor*/
   if tracklength<10 or entryangle<15 then continue;
   meteorct+1;
   /*Separate concat for Qc analysis*/
   	sta1=scan(concat,1,'/');
	sta2=scan(concat,2,'/');
	sta3=scan(concat,3,'/');
	sta4=scan(concat,4,'/');
	sta5=scan(concat,5,'/');
   	sta6=scan(concat,6,'/');
	sta7=scan(concat,7,'/');
	sta8=scan(concat,8,'/');
	sta9=scan(concat,9,'/');
	sta10=scan(concat,10,'/');	
   output;
end;
run;
/*Calculate Qc for each Random Meteor in FoV of all station pairs*/
Data QC_out (keep=stationcount concat station1 station2);
set random_meteors;

/* QUESTION: How to define array  STATION[] to process all possible pairs of up to STA1-STA10 in RANDOM_METEORS by using
 STATIONCOUNT for each record in RANDOM_METEORS as the dimension for STATION[]? 
 
Don't know how to declare variable values STA1-STA10 for array STATION
to be consistent with variable array dimension that would come from STATIONCOUNT. 
 
Currently hard-coded with maximum of STATIIONCOUNT found by inspection*/

array station[4] $ STA1-STA4;
do i=1 to 3;
do j=i+1 to 4;
station1= station{i};
station2= station{j};
if station2 eq '' then continue;
output;
end;
end;
run;

Tom · Posted 08-02-2021 12:03 PM

Sounds like you are looking for the SET statement.

set RANDOM_METEORS (keep = STA1-STA10 STATIONCOUNT);

So you might be wanting to do something like this?

data want;
   set RANDOM_METEORS (keep = STA1-STA10 STATIONCOUNT);
   array sta sta1-sta10;
   do index=1 to min(stationcount,10);
     * do something with sta[index] ;
   end;
run;

View solution in original post

Tom · Posted 08-02-2021 10:59 AM

I cannot tease out from the long posting what your question is. Can you clarify what your issue is?

Are you trying to find the min and max over a whole dataset of multiple variables?

data want;
do until(eof);
  set have end=eof;
  min = min(of min sta1-sta10);
  max = max(of max sta1-sta10);
end;
keep min max;
run;

One thing to note is that there is simpler way to generate a random integer using the 'INTEGER' distribution of the RAND() function.

   TX_Start = rand('integer',txmin, txmax);

genemroz · Posted 08-02-2021 11:09 AM

Thanks, Tom, for your prompt response. And thanks, also, for the tip on random integers.

The most succinct way I can describe my question is: If I use the variable STATIONCOUNT to set the dimension of array STATION, how do I code the initial values?

Hope that clarifies the issue,

Gene

Tom · Posted 08-02-2021 11:44 AM

@genemroz wrote:

Thanks, Tom, for your prompt response. And thanks, also, for the tip on random integers.

The most succinct way I can describe my question is: If I use the variable STATIONCOUNT to set the dimension of array STATION, how do I code the initial values?

Hope that clarifies the issue,

Gene

Still not seeing it. If the variable STATIONCOUNT has a single value then put it into a macro variable and use the macro variable to generate the code. Looks like you have STATIONCOUNT on multiple observations so you probably want to take the maximum to make sure the array is defined large enough.

So if max(STATTIONCOUNT) is 4 and you want to generate an array name STATION that has four variables named STATION1 to STATION4 then you would just code.

proc sql noprint;
 select max(stationcount) format=32. into :stations trimmed
 from have ;
quit;
data want;
  set have;
  array station [&stations] ;
  ...
run;

Now as to initial values what are they? Where are they coming from? If they are constants, like zero, then just use ARRAY statement to set the initial values.

array station [&stations] (&stations*0);

genemroz · Posted 08-02-2021 11:53 AM

Tom,

Thanks for hanging in there with me. The initial values (STA1 through STA10) , as well as STATIONCOUNT, come from dataset RANDOM_METEORS.

Gene

Tom · Posted 08-02-2021 12:03 PM

Sounds like you are looking for the SET statement.

set RANDOM_METEORS (keep = STA1-STA10 STATIONCOUNT);

So you might be wanting to do something like this?

data want;
   set RANDOM_METEORS (keep = STA1-STA10 STATIONCOUNT);
   array sta sta1-sta10;
   do index=1 to min(stationcount,10);
     * do something with sta[index] ;
   end;
run;

genemroz · Posted 08-02-2021 12:32 PM

Tom,

Thanks for this. It's the solution I was searching for. Declaring array STA as STA1-STA10 was the key. I was trying to declare for a dimension as dictated by STATIONCOUNT. I'm marking your solution as ACCEPTED.

Thanks again for your patience,

Gene

Tom · Posted 08-02-2021 12:37 PM

All of these forms are equivalent.

array sta [10] ;
array sta sta1-sta10;
array sta [10] sta1-sta10;
array sta [*] sta1-sta10;
array sta sta1 sta2 sta3 sta4 sta5 sta6 sta7 sta8 sta9 sta10 ;

Plus you can use () or {} instead of [] if you would rather.

Tom · Posted 08-02-2021 12:26 PM

I think I see your confusion. You state:

I know I can substitute STATIONCOUNT for the array dimension and use it for loop control but I don't know how to code the initial-value list to encompass the next dimension of array.

You do NOT change the dimension of the array during a data step. It is determined when the code for the step is compiled.

You can take the maximum of STATIONCOUNT to determine how large an array you will need.

data have ;
  input concat :$70. _FREQ_ txmax tymax tzmax txmin tymin tzmin stationcount ;
datalines4;
US000S/US000V 79 30 -90 120 -200 -280 70 2
US000S/US000V/US001E 832 20 -110 120 -200 -270 70 3
US000S/US000V/US001E/US001Q 337 -10 -130 120 -200 -310 70 4
US000S/US000V/US001Q 5 -10 -280 120 -20 -300 80 3
US000S/US001E 37 -30 -120 120 -220 -350 70 2
US000S/US001E/US001Q 2115 -40 -120 120 -270 -360 70 3
US000S/US001Q 773 -210 -140 120 -390 -340 70 2
US000U/US000V 20 50 20 120 30 -40 90 2
US000V/US001E 514 40 -100 120 -170 -220 70 2
US000V/US001R 101 -110 0 120 -160 -50 70 2
US001E/US001Q 343 -90 -330 120 -270 -400 70 2
;;;;

proc sql noprint;
 select max(stationcount) into :stations trimmed 
 from have
 ;
quit;

data want;
  set have ;
  array STA [&stations] $20;
  do index=1 to stationcount;
    sta[index] = scan(concat,index,'/');
  end;
  drop index;
run;

proc print data=want;
run;

Results:

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!