BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PaulBear
Calcite | Level 5

Hello,

 

I have a long dataset that has 20,000 rows. These rows represent test scores by grade and subject for various schools. So this is what the file layout looks like:

 

SCHOOL, GRADE, SUBJECT, SCORE

 

In the 20,000 rows there are about 550 schools.

 

I want to be able to list every score combination based on grade and subject by school on the same row. So, in essence I want to have about 550 rows showing all the test score combinations for each school in wide format. I think that I could do this in a PROC Transpose statement but would there be a way to do this in an array or a macro statement?

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

One key difference between an array method and transpose is the dynamic nature. 

Proc Transpose doesn't require any prior knowledge about the number of levels while an array method will need the number of variables ahead of time. 

A work around for this is to create a macro variable with the number of variables requires if this information is not easily available. 

 

To implement this workaround you can use a SQL step prior to get the number of vars required.

 

 

proc sql noprint;
create table n_subjects as
select count(subject) as n_subject
from have
group by school;
select max(n_subject) into :n_subject
from n_subject;
quit;

%put &n_subject;

data want;
set have;

array subjects(&n_subject) $ 50 subject1-subject&n_subject.;
...

run;

 

View solution in original post

8 REPLIES 8
Reeza
Super User

Ahhhh....why? 

 

Your data at is in a useable format now, the wide format will be cumbersome. 

What are you expecting to gain from transposing the data?

PaulBear
Calcite | Level 5

We are required to provide a file where all school information is on one row. Currently one school, if they have 30 test and grade combinations would have 30 rows - of course some of these are different based on how many tests and grades a school has.

Reeza
Super User

Since your transposing it you can use either an array or transpose. An array method is faster since it is one step but proc transpose is three steps, one transpose for each of subject, grade, score and then merge by school. 

The transpose is easier to program though

PaulBear
Calcite | Level 5

Thank you for your response. I think I would know how to do the Transpose but as I am trying to broaden my SAS programming skills, I want to understand how I could do this in an array?

Reeza
Super User

Here's a write up on the various scenarios. 

 

http://www.ats.ucla.edu/stat/sas/modules/longtowide_data.htm

Reeza
Super User

One key difference between an array method and transpose is the dynamic nature. 

Proc Transpose doesn't require any prior knowledge about the number of levels while an array method will need the number of variables ahead of time. 

A work around for this is to create a macro variable with the number of variables requires if this information is not easily available. 

 

To implement this workaround you can use a SQL step prior to get the number of vars required.

 

 

proc sql noprint;
create table n_subjects as
select count(subject) as n_subject
from have
group by school;
select max(n_subject) into :n_subject
from n_subject;
quit;

%put &n_subject;

data want;
set have;

array subjects(&n_subject) $ 50 subject1-subject&n_subject.;
...

run;

 

TomKari
Onyx | Level 15

On the one hand, I admire and support your desire to expand your SAS programming skills. However, I've been in your type of situation a number of times, and I am absolutely convinced that the PROC TRANSPOSE approach is easier, cleaner, and more supportable.

 

If you were working for me, I'd insist on you doing it that way, as I know that down the road I would have fewer problems. It's just an approach that naturally mirrors the nature of the data. And we're in the business to solve problems cost-effectively.

 

I guarantee, you'll have the opportunity to learn arrays some day, when THEY'RE the optimal approach!

 

Tom

PGStats
Opal | Level 21

Are you required to produce a dataset or a report table? If the latter, look at proc tabulate output:

 

proc tabulate data=myData format=5.2;
class school grade subject;
var score;
table school, grade*subject*mean=""*score="";
run;
PG

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1650 views
  • 4 likes
  • 4 in conversation