Help using Base SAS procedures

How to run the same code for different data sets?

Accepted Solution Solved
Reply
Contributor
Posts: 65
Accepted Solution

How to run the same code for different data sets?

Hi,

I have two questions and I would appreciate if you help me.

1- I have more than 240 data sets with the same variables, each of which contains about 2 millions observations. How can I run the same statement for all of them without merging them together? For example, when I want to define a dummy variable for one of the variables that exists in all data sets, how can I define the dummy variable one time and apply it for all of the data sets?

2- As I mentioned, each of the data sets contains about 2 millions observations. Suppose that he data set contains the information of students of a university. We have three observations: name of the students, Exam#n, scores. How can I ask the SAS to calculate the average scores of each persons in all the exams she has taken and put the results in another column, like average scores?

Thanks,


Accepted Solutions
Solution
‎10-08-2014 08:16 PM
Super User
Posts: 17,784

Re: How to run the same code for different data sets?

You can't really just add a column to a table, you end up re-writing the whole table so it's inefficient, which is why I was asking what the rest of your code was.

anyways, you can use proc sql to add in the average:

proc sql;

create table want as

select *, avg(age) as avg_age

from sashelp.class

group by sex;

quit;

proc print; run;

View solution in original post


All Replies
Contributor
Posts: 23

Re: How to run the same code for different data sets?

As for #1... use a macro. Say you have datasets ds001, ds002, ab01b and you want to do the dummy variable assignment on all of them. Here is what your code may look like:

%MACRO repeat(dslist=);

%DO i=1 %TO %SYSFUNC(COUNTW(&dslist));

     %LET ds = %SCAN(&dslist, &i);

     DATA &ds;

          SET &ds;

          SELECT (var1);

               WHEN ("Yes") dum1=1;

               WHEN("No")   dum2=1;

               OTHERWISE dum3=1;

          END;

     RUN;

%END;

%MEND repeat;

Calling the macro:

%repeat(dslist= ds001 ds002 ab01b);

Or you could open the directory where all of the files are in a null data step and then call a similar macro definition except without the DO LOOP.

Message was edited by: Lauren Parlett

Super User
Posts: 17,784

Re: How to run the same code for different data sets?

How you have access to the names of all 240 data sets will be a key as to how to efficiently call the macro's

1. Why do you need a dummy variable? SAS has class statements that allow you to avoid having to create dummy variables usually.

2. Did you need the average score added in to the original 2 million row data set or to a different data set.

Contributor
Posts: 65

Re: How to run the same code for different data sets?

1- Thank you for mentioning the class statement. Very helpful! But I still need to call some certain observations from every data set, for example to do some statistic analysis on them.

2- I need them to add in the original data sets.

Valued Guide
Posts: 3,208

Re: How to run the same code for different data sets?

How many variables sizing more detailed estimates for the output.

The class statement is a good one an d may there are more to think on.

I made just a mistake and my test dataset (UE) became 5Gn with 130M records still running within 10 minutes.

Better to develop something with small data with bug data in mind. 480M records start feeling as big data. 

---->-- ja karman --<-----
Contributor
Posts: 65

Re: How to run the same code for different data sets?

44 variables. I am working with just one of the data sets, but I need to write my code in a way that can work for all data sets.

Valued Guide
Posts: 3,208

Re: How to run the same code for different data sets?

You can add them in view with where processing .....  Please do a more complete description.

240 datesets 2M records each 3 variables name-student Exam score (type & length)

What certain observations do you need (where processing?)  what is some statistics  adding (proc append? or adding/updating a column)

What are your hardware restrcitions. SAs version.

---->-- ja karman --<-----
Contributor
Posts: 65

Re: How to run the same code for different data sets?

student case was just an example. Each data set contains 44 VARs.

Super User
Posts: 17,784

Re: How to run the same code for different data sets?

Perhaps post your code for your one data set and we can help generalize. Again, how do you plan to refer to each of the 240 data sets, is there a naming convention of sorts? I'm strongly against adding 44 new variables to datasets that are 20 million in size already so if you explain more perhaps there maybe more efficient methods to accomplish what you want.

Contributor
Posts: 65

Re: How to run the same code for different data sets?

The data sets contains some private information. However the logic is the same as student case below:

NameNumber
of exam
ScoreAverage Score
Alice 180
Alice 2100
Alice 390
Alice 470
Peter175
Peter271
Peter367
Peter463
Peter559
John155
John251
John347
John443
John539
John635

What I want to do is calculating the average score for each person and putting it in the average score column.

The data set contains 44 VARs, but I need to do this operation just for one of them. So, I will add just one more variable.

Solution
‎10-08-2014 08:16 PM
Super User
Posts: 17,784

Re: How to run the same code for different data sets?

You can't really just add a column to a table, you end up re-writing the whole table so it's inefficient, which is why I was asking what the rest of your code was.

anyways, you can use proc sql to add in the average:

proc sql;

create table want as

select *, avg(age) as avg_age

from sashelp.class

group by sex;

quit;

proc print; run;

Contributor
Posts: 65

Re: How to run the same code for different data sets?

Thank you.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 1813 views
  • 9 likes
  • 4 in conversation