BookmarkSubscribeRSS Feed
JimDillon
Calcite | Level 5

Attached is the log from a SAS program. The code could probably be a lot more efficient but ...

My goal is to see what jobs haven't been run in the past year.

 

I'm reading in a lot of data so the information at the bottom of page 12 (408619 observations) appears to be correct.  However the log text near the bottom of page 23 (end of data set TASKS) doesn't seem quite correct - if, for each observation in data set PDB, I increment a counter, I should have 1 observation in data set TASKS with 579 variables, and not 407419 observations.

Why so many observations?

 

15 REPLIES 15
Shmuel
Garnet | Level 18

I have selected lines from your log and you have probably missed something:

 

The data set WORK.PDBIN has 408619 observations and 4 variables

The data set WORK.TASKS has 407419 observations and 578 variables.

The data set WORK.TASKNAMES has 407419 observations and 579 variables.

1400 PROC SORT DATA=TaskNames NODUPS;
1401 BY TASKNAME;
1402 RUN;

NOTE: There were 407419 observations read from the data set WORK.TASKNAMES.
NOTE: 407238 duplicate observations were deleted.
NOTE: The data set WORK.TASKNAMES has 181 observations and 579 variable

 

 

So what is your exact question?

Try to post the first step making the issue.

JimDillon
Calcite | Level 5

Shmuel,

Per your question on what I'm trying to accomplish.  I have a list of tasks that are permitted to execute through our security product.  Over the years we have stopped running some but have not removed their definitions from the security product.  I'm going through now to see if these tasks have run in the past year/ two years, and if they have not, remove them from the security product.  I have the list of the definitions for these tasks in the security product.  I also have a list of of what tasks (and many other jobs) that have actually ran in the past year/ two years.  My idea was to bump the list of 578 permitted tasks against what has actually run (408619 objects) and if there is no record of it having been run, remove it from the security product.  

Kurt_Bremser
Super User

So you have your list of jobs allowed in a dataset. Sort that by the jobname.

Sort the jobs that gave run with nodupkey.

Merge the two datasets, and keep those that are in allowed, but not in pdbin.

Shmuel
Garnet | Level 18

You can select distinct names of jobs run per year (let's say 2018 and 2019)

then you can select all those jobs which run in 2018 but not in 2019,

thus you get directly what you want - list of jobs not used any more.

 

adapt years as need.

FreelanceReinh
Jade | Level 19

Hello @JimDillon and welcome to the SAS Support Communities!

 

There's nothing in this DATA step that prevents the implied OUTPUT statement from writing one observation per DATA step iteration to dataset Tasks, resulting in the same number of observations as have been read from PDBIN. You can wrap a DO-UNTIL loop around the SET statement to obtain a single observation, as shown in the example below:

data want(keep=t1-t3);
do until(last);
  set sashelp.class end=last;
  select;
    when(name=:'A') t1+1;
    when(name=:'J') t2+1;
    when(name=:'M') t3+1;
    otherwise;
  end;
end;
run;

Note that neither an array nor an initialization are needed. The latter is due to the use of sum statements (t1+1) instead of ordinary assignment statements (t1=t1+1). They initialize the variables (t1, etc.) to zero by default.

Kurt_Bremser
Super User

You have no explicit output statement in the data step that creates TASKS, so it will write one observation for every observation read from the input dataset.

 

But you seem to have never been introduced to procedures like FREQ and SUMMARY. To create a count of your jobs, just use PROC FREQ on table PDBIN.

JimDillon
Calcite | Level 5
Kurt, thanks for your suggestions on the procedures. However, I'm looking in the Base SAS 9.4 Procedures Guide and don't find the FREQ procedure - i do see the SUMMARY proc which also references the MEANS proc.
Kurt_Bremser
Super User

@JimDillon wrote:
Kurt, thanks for your suggestions on the procedures. However, I'm looking in the Base SAS 9.4 Procedures Guide and don't find the FREQ procedure - i do see the SUMMARY proc which also references the MEANS proc.

The FREQ procedure is under "Base SAS Procedures Guide. Statistical Procedures", The FREQ Procedure

Kurt_Bremser
Super User

A quick example for creating a count with FREQ:

data pdbin;
input job :$8.;
datalines;
ABENDAID
AIITAS
AIITCP
AIITAS
ABENDAID
;

proc freq data=pdbin noprint;
tables job /out=want (drop=percent);
run;
ballardw
Super User

Any time you use a structure like:

Data new;
   set old;
  <do something>
run;

The data step processes every record in the data set OLD. If there were 100 records the <do something> code gets executed for each record, resulting in 100 records unless you do something to explicitly add records or only output the data for certain records (If <condition> then output;>

Since you read the entire input data set you get each record out with the variables incremented.

 

It might be a good idea to describe what you were attempting as this looks like a not very efficient attempt to count something and likely could be done with proc freq and/or transpose.

Your select is likely not doing what you expect anyway.

You show code like:

189          WHEN (JOB =: 'AS')       T19 = T19 + 1;
190          WHEN (JOB =: 'ASCH')     T20 = T20 + 1;

If you intend the =: to be the "begins with" then "ASCH<anytext>" will begin with 'AS' and get in the T19 slot.

Example:

data example;
   input job :$10.;
   array t(6);
   do i=1 to dim(t); t[i]=0;end;
   select;
      WHEN (JOB =: 'DSN')      T1 = T1 + 1;
      WHEN (JOB =: 'DSNCDBM1') T2 = T2 + 1;
      WHEN (JOB =: 'DSNCDIST') T3 = T3 + 1;
      WHEN (JOB =: 'DSNCIRLM') T4 = T4 + 1;
      WHEN (JOB =: 'DSNCMSTR') T5 = T5 + 1;
      WHEN (JOB =: 'DSNCWLMC') T6 = T6 + 1;
      otherwise ;
   end;
datalines;
DSN
DSNCDBM1
DSNCDIST
DSNCIRLM
DSNCMSTR
DSNCWLMC
;

Note that the only "counter" that gets incremented is for DSN.

If you don't mean to use "begins with" then you really need to be looking at Proc freq.

Or you may want a multilabel format based on the Job variable that would allow a procedure like Proc Tabulate or Report to show counts with the "DSN" group plus all the other "DSNxxx" job values.

Shmuel
Garnet | Level 18

I have selected few more rows from the log:

The data set WORK.PDBIN has 408619 observations and 4 variables

154         PROC SORT DATA=PDBIN NODUPS;
155          BY JOB;
156         RUN;

NOTE: There were 408619 observations read from the data set WORK.PDBIN.
NOTE: 1200 duplicate observations were deleted.
NOTE: The data set WORK.PDBIN has 407419 observations and 4 variables.
The data set WORK.TASKS has 407419 observations and 578 variables.

The data set WORK.TASKNAMES has 407419 observations and 579 variables.

1400        PROC SORT DATA=TaskNames NODUPS;
1401         BY TASKNAME;
1402        RUN;

NOTE: There were 407419 observations read from the data set WORK.TASKNAMES.
NOTE: 407238 duplicate observations were deleted.
NOTE: The data set WORK.TASKNAMES has 181 observations and 579 variables.

according to log you read 408619 rows from PDBIN.

there were 1200 duplicate JOBs.

clearing duplicates you remained 407419 JOBs.

 

then you counted tasks per selected jobs.

if you want only those jobs, you need a change in next step:

DATA Tasks (KEEP = /*TASKARRAY*/ JOB T1-T578);
Set PDBIN;
  BY JOB;    /* line added */
       Array TASKARRAY{578} T1-T578;
           /* initiallize the array */
          do i=1 to 578;
              TASKARRAY{i} = 0;
           end;
/* Search the PDB and for each task by that name found, increment counter */
SELECT;
    WHEN (JOB =: 'ABENDAID') T1 = T1 + 1;
     WHEN (JOB =: 'AIITAS')   T2 = T2 + 1;
    ........
     WHEN (JOB =: 'ZVIEW')    T578 = T578 + 1;
     OTHERWISE delete;   /* delete other jobs from input */
 END;
 Run;   
Shmuel
Garnet | Level 18
but you don't need all those counters as each job has its own counter.
you just need:
data tasks;
set PDBIN;
by JOB;
if first.job then counter+1;
if JOB in (<list of JOBs) then output;
run;
JimDillon
Calcite | Level 5
Scmuel, thanks for your suggestion. I tried your first suggestion but I didn't see a decrease in the number of output observations. I then considered your second suggestion but it doesn't seem to support the 'begins with' concept that I have in place - and as pointed out, needs a little work - I don't know (as BallardW pointed out) if I have both "JOB =: 'AS'" and "JOB =: 'ASCH'" then the first would include the second - but how many jobs start with 'AS' other than 'ASCH'?

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 3351 views
  • 0 likes
  • 5 in conversation