DATA Step, Macro, Functions and more

Delete first 5 rows for every dataset in WORK library

Accepted Solution Solved
Reply
Regular Contributor
Posts: 174
Accepted Solution

Delete first 5 rows for every dataset in WORK library

[ Edited ]

I want to append all the datasets I have in my WORK library together. They all have the same column names, but they start on row 6. I have too many datasets to delete first 5 rows with a macro. Is there a good shortcut I can use for this, similar to the code I use to append them to 1 dataset:

 


PROC SQL;
    SELECT distinct MEMNAME INTO : MEMNAMES SEPARATED BY ' ' from dictionary.tables where libname='WORK';
quit;

DATA full;
    SET &MEMNAMES.;
RUN;


Accepted Solutions
Solution
‎02-23-2018 12:31 PM
Super User
Posts: 8,214

Re: Delete first 5 nows for every dataset in WORK library

include the translate function to convert the spaces to underscores. e.g.:

data test1;
  input (col1-col3) (&$);
  cards;
kjkj  jkjk  ljknl
kjcio  doik  roukg
kjor  ytiukk  lkjf
ruie  eooik  piijmf
item 1  item 2  item 3
1  2  3
4  5  6
;

data test2;
  input (col1-col3) (&$);
  cards;
kjkj  jkjk  ljknl
kjcio  doik  roukg
kjor  ytiukk  lkjf
ruie  eooik  piijmf
item 1  item 2  item 3
7  8  9
2  2  2
;

data _null_;
  array namelist $32. col1-col3;
  set test1(firstobs=5 obs=5);
  length names $32000;
  do over namelist;
    names=catx(' ',names,vname(namelist),'=',
     translate(strip(vvalue(namelist)),'_',' '));
  end;
  call symput('renames',names);
run;

%put &renames.;

proc sql;
  create table dslist as
    select memname from dictionary.tables
      where libname='WORK'
  ;
quit;

data _null_;
  call execute ('data want; set ') ;
  do until (done);
    set dslist end=done;
    call execute (memname);
    call execute("(firstobs=6 rename=(&renames.))");
  end;
  call execute ('; run;') ;
  stop;
run;

Art, CEO, AnalystFinder.com

 

View solution in original post


All Replies
Super User
Posts: 6,921

Re: Delete first 5 nows for every dataset in WORK library

[ Edited ]

Sure.  Let's start with something similar to what you used before:

 

proc sql;

create table dslist as

select memname from dictionary.tables

where libname='WORK';

quit;

 

Instead of creating a macro variable, this creates a data set holding the names of your data sets.  Then construct and execute the rest of the program using:

 

data _null_;

call execute ('data want; set ') ;

do until (done);

   set dslist end=done;

   call execute (memname);

   call execute('(firstobs=6)');

end;

call execute ('; run;') ;

stop;

run;

Regular Contributor
Posts: 174

Re: Delete first 5 nows for every dataset in WORK library

[ Edited ]
Posted in reply to Astounding

Thank you - almost got it. It is selecting the first row as the variable names. Can I edit this to select row 5 as variable names (firstobs is still at 6)?

Super User
Posts: 8,214

Re: Delete first 5 nows for every dataset in WORK library

If all of your files have the same variables, you could just modify @Astounding's code as follows:

data test1;
  input (col1-col3) ($);
  cards;
kjkj jkjk ljknl
kjcio doik roukg
kjor ytiukk lkjf
ruie eooik piijmf
item1 item2 item3
1 2 3
4 5 6
;

data test2;
  input (col1-col3) ($);
  cards;
kjkj jkjk ljknl
kjcio doik roukg
kjor ytiukk lkjf
ruie eooik piijmf
item1 item2 item3
7 8 9
2 2 2
;

data _null_;
  array namelist $32. col1-col3;
  set test1(firstobs=5 obs=5);
  length names $32000;
  do over namelist;
    names=catx(' ',names,vname(namelist),'=',vvalue(namelist));
  end;
  call symput('renames',names);
run;

proc sql;
  create table dslist as
    select memname from dictionary.tables
      where libname='WORK'
  ;
quit;

data _null_;
  call execute ('data want; set ') ;
  do until (done);
    set dslist end=done;
    call execute (memname);
    call execute("(firstobs=6 rename=(&renames.))");
  end;
  call execute ('; run;') ;
  stop;
run;

Of course, that would have to be expanded if the problem is more complex.

 

Art, CEO, AnalystFinder.com

 

Regular Contributor
Posts: 174

Re: Delete first 5 nows for every dataset in WORK library

Hello - I am a little unclear for the array - I need to input the names of the current column names?
Super User
Posts: 8,214

Re: Delete first 5 nows for every dataset in WORK library

Yes! Instead of:

  array namelist $32. col1-col3;

col1-col3 should be replaced with your actual variable names

 

Art, CEO, AnalystFinder.com

 

Regular Contributor
Posts: 174

Re: Delete first 5 nows for every dataset in WORK library

getting an error message at the second data _null_. Can you help me troubleshoot? THank you!

ERROR 214-322: Variable name = is not valid.

ERROR 23-7: Invalid value for the RENAME option.


***error was given per log at the 2nd column name rename

ERROR 79-322: Expecting a =.
Regular Contributor
Posts: 174

Re: Delete first 5 nows for every dataset in WORK library

Ah, it is because the actual variable names in line 5 have spaces...

Solution
‎02-23-2018 12:31 PM
Super User
Posts: 8,214

Re: Delete first 5 nows for every dataset in WORK library

include the translate function to convert the spaces to underscores. e.g.:

data test1;
  input (col1-col3) (&$);
  cards;
kjkj  jkjk  ljknl
kjcio  doik  roukg
kjor  ytiukk  lkjf
ruie  eooik  piijmf
item 1  item 2  item 3
1  2  3
4  5  6
;

data test2;
  input (col1-col3) (&$);
  cards;
kjkj  jkjk  ljknl
kjcio  doik  roukg
kjor  ytiukk  lkjf
ruie  eooik  piijmf
item 1  item 2  item 3
7  8  9
2  2  2
;

data _null_;
  array namelist $32. col1-col3;
  set test1(firstobs=5 obs=5);
  length names $32000;
  do over namelist;
    names=catx(' ',names,vname(namelist),'=',
     translate(strip(vvalue(namelist)),'_',' '));
  end;
  call symput('renames',names);
run;

%put &renames.;

proc sql;
  create table dslist as
    select memname from dictionary.tables
      where libname='WORK'
  ;
quit;

data _null_;
  call execute ('data want; set ') ;
  do until (done);
    set dslist end=done;
    call execute (memname);
    call execute("(firstobs=6 rename=(&renames.))");
  end;
  call execute ('; run;') ;
  stop;
run;

Art, CEO, AnalystFinder.com

 

Super User
Posts: 6,921

Re: Delete first 5 nows for every dataset in WORK library

I can't see what you are looking at.  You will have to show me the first 6 observations from one of the data sets for me to participate in this.  (I wouldn't need all the variables, but 3 or 4 would be necessary.)

 

In general, it would not be a bad idea to do this as two separate steps ... combine all the data values as one step, and go back and rename variables as a second step.  As Art indicated, it can be automated but I can't see what you are looking at.

Trusted Advisor
Posts: 1,389

Re: Delete first 5 nows for every dataset in WORK library

Posted in reply to Astounding

This is a question the justifies use of global options firstobs=, as in:

 

PROC SQL;
  SELECT distinct MEMNAME INTO : MEMNAMES SEPARATED BY ' ' from dictionary.tables where libname='WORK';
quit;

options firstobs=6;
DATA full;
    SET &MEMNAMES.;
RUN;
options firstobs=1;

 

The firstobs=6 option will be applied to every data set subsequently being read in.  After the successful append, reset option to its default (firstobs=1).

 

Regular Contributor
Posts: 174

Re: Delete first 5 nows for every dataset in WORK library

firstobs=6 is working but it is still labeling the columns based on some text in A1 and just default excel column letters B..C..D.... for the rest. firstobs=1 is not changing the column labels to variable names in row 5.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 251 views
  • 5 likes
  • 4 in conversation