Run SAS array when array value is not consecutive

Petergao1 · Posted 07-14-2020 03:05 PM

Hi friends,

I have a question to ask. When one runs an array job, the array's dimension-size

value must be consecutive, like from 5-9 (5,6,7,8,9). I have a data set for schools with subgroups. which is not consecutive. Its value is 1,4,5,6,7,8,9,13,14,25, in long format.

In order for the array to run,I have to create another variable, called group, to re-assign value to be consecutive, like, 1,2,3,4,5,6,7,8,9,10, (when subgroup=1, group=1, subgroup=4, group=2, subgroup=5, group=3, etc. Otherwise my array won't run properly.

This program is to turn data file from long to wide, to control the output, to avoid using a transpose, which is done so in the below codes.

Problem: After my data set was processed, I have to rename my output columns back to subgroup values, which is additional work and tedious.

My question: is there some way that one can use original data' s value (nonconsecutive) to run an array and get the job done?

*Note I have a data step to create a new var (group) before I can run my data step with an array;

*Also the file is sorted by sch_id and group;

Here is my program that is tested and working:

DATA want_wide ; *create a new data file name to be in wide format;
SET have_long; *read in the existing data file in long format;
By sch_id ; *read in data by the order of sch_id-original data each school has 10 rows (or less) of data;

KEEP sch_id EM_ELA_statusCY1 - EM_ELA_statusCY10; *keep output data's these column only;
RETAIN sch_id EM_ELA_statusCY1 - EM_ELA_statusCY10 ; *keep output data's column in this order;

ARRAY emstatus (1:10) $ EM_ELA_statusCY1 - EM_ELA_statusCY10 ; *process data by group value, from 1 to 10;
*based on sh_id, do the data step by the order of group id value;
IF first.sch_id THEN DO;
DO i = 1 to 10 ;
emstatus ( i ) = "" ; *output data is character type;
*array's order is based on i's value;
*above defaulted array index output to be missing (" "), when it has data,then it will be replaced;
END;*finish array execution;
END;* finish the data round beginning with sch_id do loop;
emstatus ( group ) = EM_ELA_statusCY ; *this is the column to be transposed;
* this will assign output new column name (EM subject=ELA) combined with group value for each new column;
IF last.sch_id THEN OUTPUT ;*when reached the last sch_id, output data;
RUN; *It worked, got the output as desired-long data set become wide, one school per row with 10 new columns;

* After this I have to rename the 10 columns back to the subgroup value, say

EM_ELA_statusCY8=EM_ELA_statusCY13,

EM_ELA_statusCY9= EM_ELA_statusCY14

EM_ELA_statusCY10=EM_ELA_statusCY25,etc.;

*Since I have 40 such columns, renaming them can be time consuming, the example only show 10 of them, there are three other sets of data, each set will produce 10 columns;

Any idea to go around so I do not need to create a new group var, and after the job rename my output columns back to subgroups' value?

Thank you.

Curious Peter

Reeza · Posted 07-14-2020 03:10 PM

Why not use TRANSPOSE instead of avoiding it?

One of the key benefits of TRANSPOSE is the ID and IDLABEL statement which easily allow you to control the new variable names as well as the variable labels.

Petergao1 · Posted 07-14-2020 03:40 PM

I have program that run the transpose job. Now I am seeking alternative way to have it done differently.

Kurt_Bremser · Posted 07-15-2020 03:33 AM

@Petergao1 wrote:

I have program that run the transpose job. Now I am seeking alternative way to have it done differently.

PROC TRANSPOSE is the way to go for this. Don't waste brain cycles ín a futile attempt to reinvent the wheel. Use your time for more useful things.

Just compare this:

data have;
input sch_id $ subgroup EM_ELA_statusCY $;
datalines;
A 1 abc
A 13 def
A 14 ghi
A 22 xyz
;

proc transpose data=have out=want prefix=EM_ELA_statusCY;
by sch_id;
var EM_ELA_statusCY;
id subgroup;
run;

with your complicated array code, the need to convert the subgroups to a continuous series, and the re-translation from that. Which one of those codes will you want to maintain in the future?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PaigeMiller · Posted 07-14-2020 03:18 PM

I like @Reeza 's suggestion.

Alternatively:

If you have variables var1, var4, var5, var6, var7, var8, var9, var13, var14, var25, this should work (note the double-dash)

ARRAY emstatus $ var1--var25 ;

Also, I haven't gone through your code in detail, but usually converting long to wide just makes your problem worse. Usually long data sets are much easier to handle and analyze.

--
Paige Miller

Petergao1 · Posted 07-14-2020 03:43 PM

Thanks for the suggestion. My data is in long format and there is one column called subgroup with values of 1,4,5,6,7,8,9,13,14,25; another column is the Em ELA status. that is why and where I encounter problem if I use the subgroup values as my array's dimension.

Reeza · Posted 07-14-2020 04:12 PM

Another option is to ignore the missing indexes. Create all the variables, even the empty ones. At the end, do a check for variables that are entirely empty and drop them.

PaigeMiller · Posted 07-14-2020 05:34 PM

@Petergao1 wrote:

Thanks for the suggestion. My data is in long format and there is one column called subgroup with values of 1,4,5,6,7,8,9,13,14,25; another column is the Em ELA status. that is why and where I encounter problem if I use the subgroup values as my array's dimension.

Everything you said points to not transposing the data, leaving it long instead of wide. But you'd have to tell us what analysis you are planning with this data to be sure.

Also, if you TRANSPOSE the data, there shouldn't be any problems using the "double-dash" method I showed.

So I'm not sure what your resistance is to either idea.

--
Paige Miller

Reeza · Posted 07-14-2020 06:07 PM

I have program that run the transpose job. Now I am seeking alternative way to have it done differently.

Given that this seems like more of an academic/learning exercise.

ballardw · Posted 07-14-2020 03:48 PM

Quite often it is a very good idea to provide example starting data, the desired output and the rules involved.

Transpose is likely an option.

Some ideas:

You have code like this to set all elements of an array to missing:

DO i = 1 to 10 ;
emstatus ( i ) = "" ; *output data is character type;
*array's order is based on i's value;
*above defaulted array index output to be missing (" "), when it has data,then it will be replaced;
END;*finish array execution;

The do loop is unneeded you can accomplish the exact same thing with:

call missing( of emstatus(*));

Call missing sets the values of all variables, and yes a mix of character and numeric is allowed, in the parentheses to missing for the variable type. You can process all elements of an array with the "of arrayname(*) " syntax.

One option might be to consider a custom custom format that would map your desired group values (still not quite sure why the word "group" is used ). Then you could use index values like varname[ functname(indexvalue)] to get the "group" .

But likely the introduction of additional variables to create the needed names with the ID statement in Proc Transpose is going to much easier.

Petergao1 · Posted 07-14-2020 04:52 PM

Thanks again for the suggestion. I don't mind the array will run a dimension from 1, 2,3,... till 25. And then I can drop those empty cases. But I do not know how to do that.

Here is the data structure (I took the first one case) and the first status column to process it into wide format. Th out put will be one row for one school with additional 10 columns, each columns is a group's status for a subject, like ELA.

sch_id	groupid	group	EM_ELA_StatusCY	EM_Math_StatusCY	HS_ELA_StatusCY	HS_Math_StatusCY
1	1	1	GS:MP	GS:MP	-	-
1	4	2	-	-	-	-
1	5	3	-	-	-	-
1	6	4	-	-	-	-
1	7	5	-	-	-	-
1	8	6	-	-	-	-
1	9	7	GS:MP	GS:MP	-	-
1	13	8	-	-	-	-
1	14	9	GS:WAA	GS:WAA	-	-
1	25	10	-	-	-	-

Reeza · Posted 07-14-2020 05:31 PM

If you do a quick search on dropping empty columns on here you'll find that it's a question that gets ask and answered at least once a month I'd guess.

Petergao1 · Posted 07-16-2020 09:44 AM

I wished I can do that. I am new to this group and like to see your previous example or answers. Thanks!

Reeza · Posted 07-16-2020 01:04 PM

Learning how to search is worth the time.

Search terms used in the giant 'Find your SAS answers' above:

"delete variable missing"

https://communities.sas.com/t5/forums/searchpage/tab/message?advanced=false&allow_punctuation=false&...

@Petergao1 wrote:

I wished I can do that. I am new to this group and like to see your previous example or answers. Thanks!

ballardw · Posted 07-14-2020 06:27 PM

@Petergao1 wrote:

Thanks again for the suggestion. I don't mind the array will run a dimension from 1, 2,3,... till 25. And then I can drop those empty cases. But I do not know how to do that.

Here is the data structure (I took the first one case) and the first status column to process it into wide format. Th out put will be one row for one school with additional 10 columns, each columns is a group's status for a subject, like ELA.

sch_id groupid group EM_ELA_StatusCY EM_Math_StatusCY HS_ELA_StatusCY HS_Math_StatusCY

1 1 1 GS:MP GS:MP - -

1 4 2 - - - -

1 5 3 - - - -

1 6 4 - - - -

1 7 5 - - - -

1 8 6 - - - -

1 9 7 GS:MP GS:MP - -

1 13 8 - - - -

1 14 9 GS:WAA GS:WAA - -

1 25 10 - - - -

Are all those dashes indicating missing data? I am not sure why you make things "wide" with missing data.

Maybe it is time to ask what this somewhat ugly data set is to be used for. And possibly where that particular structure originate?

With the column heading you show in that source this starts to sound like a REPORT may be easier to make then a very ugly data set if the data is not used for further analysis.

Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Registration is open

Registration is open

SAS Training: Just a Click Away