BookmarkSubscribeRSS Feed
Petergao1
Fluorite | Level 6

Hi friends,

I have a question to ask. When one runs an array job, the array's dimension-size

value must be consecutive, like from 5-9 (5,6,7,8,9). I have a data set for schools with subgroups. which is not consecutive. Its value is 1,4,5,6,7,8,9,13,14,25, in long format.

In order for the array to run,I have to create another  variable, called group, to re-assign value to be consecutive, like, 1,2,3,4,5,6,7,8,9,10, (when subgroup=1, group=1, subgroup=4, group=2, subgroup=5, group=3, etc. Otherwise my array won't run properly.

This program is to turn data file from long to wide, to control the output, to avoid using a transpose, which is done so in the below codes.

Problem: After my data set was processed, I have to rename my output columns back to subgroup values, which is additional work and tedious. 

My question: is there some way  that one can use original data' s value (nonconsecutive) to run an array and get the job done?

*Note I have a data step to create a new var (group) before I can run my data step with an array;

*Also the file is sorted by sch_id and group;

Here is my program that is tested and working:

 

DATA want_wide ; *create a new data file name to be in wide format;
SET have_long; *read in the existing data file in long format;
By sch_id ; *read in data by the order of  sch_id-original data each school has 10 rows (or less) of data;

KEEP sch_id EM_ELA_statusCY1 - EM_ELA_statusCY10; *keep output data's these column only;
RETAIN  sch_id EM_ELA_statusCY1 - EM_ELA_statusCY10 ; *keep output data's column in this order;

ARRAY emstatus (1:10) $ EM_ELA_statusCY1 - EM_ELA_statusCY10 ; *process data by group value, from 1 to 10;
*based on sh_id, do the data step by the order of  group id value;
IF first.sch_id THEN DO;
DO i = 1 to 10 ;
emstatus ( i ) = "" ; *output data is character type;
*array's order is based on i's value;
*above defaulted array index output to be missing (" "), when it has data,then it will be replaced;
END;*finish array execution;
END;* finish the data round beginning with sch_id do loop;
emstatus ( group ) = EM_ELA_statusCY ; *this is the column to be transposed;
* this will assign output new column name (EM subject=ELA) combined with group value for each new column;
IF last.sch_id THEN OUTPUT ;*when reached the last sch_id, output data;
RUN; *It worked,  got the output as desired-long data set become wide, one school per row with 10 new columns;

 

* After this I have to rename the 10 columns back to the subgroup value, say 

EM_ELA_statusCY8=EM_ELA_statusCY13,

EM_ELA_statusCY9= EM_ELA_statusCY14

EM_ELA_statusCY10=EM_ELA_statusCY25,etc.;

 

*Since I have 40 such columns, renaming them can be time consuming, the example only show 10 of them, there are three other sets of data, each set will produce 10 columns;

 

Any idea to go around so I do not need to create a new group var, and after the job rename my output columns back to subgroups' value?

 

Thank you.

 Curious Peter

16 REPLIES 16
Reeza
Super User
Why not use TRANSPOSE instead of avoiding it?

One of the key benefits of TRANSPOSE is the ID and IDLABEL statement which easily allow you to control the new variable names as well as the variable labels.
Petergao1
Fluorite | Level 6

I have program that run the transpose job. Now I am seeking alternative way to have it done differently.

Kurt_Bremser
Super User

@Petergao1 wrote:

I have program that run the transpose job. Now I am seeking alternative way to have it done differently.


PROC TRANSPOSE is the way to go for this. Don't waste brain cycles ín a futile attempt to reinvent the wheel. Use your time for more useful things.

Just compare this:

data have;
input sch_id $ subgroup EM_ELA_statusCY $;
datalines;
A 1 abc
A 13 def
A 14 ghi
A 22 xyz
;

proc transpose data=have out=want prefix=EM_ELA_statusCY;
by sch_id;
var EM_ELA_statusCY;
id subgroup;
run;

with your complicated array code, the need to convert the subgroups to a continuous series, and the re-translation from that. Which one of those codes will you want to maintain in the future?

PaigeMiller
Diamond | Level 26

I like @Reeza 's suggestion.

 

Alternatively:

If you have variables var1, var4, var5, var6, var7, var8, var9, var13, var14, var25, this should work (note the double-dash)

 

ARRAY emstatus $ var1--var25 ;

Also, I haven't gone through your code in detail, but usually converting long to wide just makes your problem worse. Usually long data sets are much easier to handle and analyze.

--
Paige Miller
Petergao1
Fluorite | Level 6

Thanks for the suggestion. My data is in long format and there is one column called subgroup with values of 1,4,5,6,7,8,9,13,14,25; another column is the Em ELA status. that is why  and where I encounter problem if I use the subgroup values as my array's dimension.

Reeza
Super User
Another option is to ignore the missing indexes. Create all the variables, even the empty ones. At the end, do a check for variables that are entirely empty and drop them.
PaigeMiller
Diamond | Level 26

@Petergao1 wrote:

Thanks for the suggestion. My data is in long format and there is one column called subgroup with values of 1,4,5,6,7,8,9,13,14,25; another column is the Em ELA status. that is why  and where I encounter problem if I use the subgroup values as my array's dimension.


Everything you said points to not transposing the data, leaving it long instead of wide. But you'd have to tell us what analysis you are planning with this data to be sure.

 

Also, if you TRANSPOSE the data, there shouldn't be any problems using the "double-dash" method I showed.

 

So I'm not sure what your resistance is to either idea.

--
Paige Miller
Reeza
Super User

I have program that run the transpose job. Now I am seeking alternative way to have it done differently.

Given that this seems like more of an academic/learning exercise.

ballardw
Super User

Quite often it is a very good idea to provide example starting data, the desired output and the rules involved.

 

Transpose is likely an option.

 

Some ideas:

You have code like this to set all elements of an array to missing:

DO i = 1 to 10 ;
emstatus ( i ) = "" ; *output data is character type;
*array's order is based on i's value;
*above defaulted array index output to be missing (" "), when it has data,then it will be replaced;
END;*finish array execution;

The do loop is unneeded you can accomplish the exact same thing with:

call missing( of emstatus(*));

Call missing sets the values of all variables, and yes a mix of character and numeric is allowed, in the parentheses to missing for the variable type. You can process all elements of an array with the "of arrayname(*) " syntax.

 

One option might be to consider a custom custom format that would map your desired group values (still not quite sure why the word "group" is used ). Then you could use index values like  varname[ functname(indexvalue)] to get the "group" .

 

But likely the introduction of additional variables to create the needed names with the ID statement in Proc Transpose is going to much easier.

 

 

 

 

Petergao1
Fluorite | Level 6

Thanks again for the suggestion. I don't mind the array will run a dimension from 1, 2,3,... till 25. And then I can drop those empty cases. But  I do not know how to do that.

 

Here is the data structure (I took the first one case) and the first status column to process it into wide format. Th out put will be one row for one school with additional 10 columns, each columns is a group's status for a subject, like ELA.

 

sch_idgroupidgroupEM_ELA_StatusCYEM_Math_StatusCYHS_ELA_StatusCYHS_Math_StatusCY
111GS:MPGS:MP--
142----
153----
164----
175----
186----
197GS:MPGS:MP--
1138----
1149GS:WAAGS:WAA--
12510----

 

Reeza
Super User
If you do a quick search on dropping empty columns on here you'll find that it's a question that gets ask and answered at least once a month I'd guess.
Petergao1
Fluorite | Level 6

I wished I can do that. I am new to this group and like to see your previous example or answers. Thanks!

Reeza
Super User

Learning how to search is worth the time. 

Search terms used in the giant 'Find your SAS answers' above:

"delete variable missing"

 

https://communities.sas.com/t5/forums/searchpage/tab/message?advanced=false&allow_punctuation=false&...

 


@Petergao1 wrote:

I wished I can do that. I am new to this group and like to see your previous example or answers. Thanks!


 

ballardw
Super User

@Petergao1 wrote:

Thanks again for the suggestion. I don't mind the array will run a dimension from 1, 2,3,... till 25. And then I can drop those empty cases. But  I do not know how to do that.

 

Here is the data structure (I took the first one case) and the first status column to process it into wide format. Th out put will be one row for one school with additional 10 columns, each columns is a group's status for a subject, like ELA.

 

sch_id groupid group EM_ELA_StatusCY EM_Math_StatusCY HS_ELA_StatusCY HS_Math_StatusCY
1 1 1 GS:MP GS:MP - -
1 4 2 - - - -
1 5 3 - - - -
1 6 4 - - - -
1 7 5 - - - -
1 8 6 - - - -
1 9 7 GS:MP GS:MP - -
1 13 8 - - - -
1 14 9 GS:WAA GS:WAA - -
1 25 10 - - - -

 


Are all those dashes indicating missing data? I am not sure why you make things "wide" with missing data.

 

Maybe it is time to ask what this somewhat ugly data set is to be used for. And possibly where that particular structure originate?

With the column heading you show in that source this starts to sound like a REPORT may be easier to make then a very ugly data set if the data is not used for further analysis.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 3477 views
  • 4 likes
  • 6 in conversation