BookmarkSubscribeRSS Feed
Emma2021
Quartz | Level 8
I have the following SAS datasets in a folder : dsn_23jan2023 etc. suffix are dates. But sometimes those tables are empty. How can I select the latest non empty data from today’s date? Thank you
8 REPLIES 8
ballardw
Super User

I hope by "in a folder" means that you have a LIBRARY  assigned to reference those data sets.

 

If so you can query the SAS metadata, either SASHELP.Vtable or Dictionary.tables in proc sql to select those data sets, parse the set name for the date value and select the desired one. The question is exactly what "empty" means. Does it mean zero observations or does it mean zero variables or all the variables have missing values?

 

There is variable NOBS that has the number of observations, typically if this is greater than 0 then the set is not "empty" but if you mean variables with missing values that may not be what you want.

Emma2021
Quartz | Level 8
Yes, on my permanent folder (C drive).
I mean there could not many columns (headings), but I want non missing data (at least one observation without missing values at least for one variable).
Thank you
Quentin
Super User

I'm having a hard time understanding the data you have, and your goal.

 

Can should show a few example datasets , i.e. show DATA steps with CARDS to create e.g. dsn_23jan2023 , dsn_24jan2023, dsn_25jan2023?  And then describe the logic for which dataset you would like to select?

 

If all the datasets have the same variables, you're probably better of concatenating them all into a single dataset.  That should make it easier to work with the data.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Tom
Super User Tom
Super User

@Emma2021 wrote:
Yes, on my permanent folder (C drive).
I mean there could not many columns (headings), but I want non missing data (at least one observation without missing values at least for one variable).
Thank you

That is a little harder to test for.  You will have to actually read the dataset to tell if there are some non-missing values.  If the test was just that the dataset had any observations you could just check the metadata.

So to find all of the dataset in the folder pointed to by the libref MYLIB that have names in the style DSN_ddmonyyyy and also have observations you could use a query like this.

proc sql;
create table candidates as
  select input(scan(memname,-1,'_'),?date9.) as date format=date9.
       , catx('.',libname,memname) as dsname
       , nobs
       , nlobs
       , nvar
  from dictionary.tables
  where libname='MYLIB'
    and memname like 'DSN^_%' escape '^'
    and nlobs > 0
  having not missing(date)
  order by 1 desc
;
quit;

Now to find the first one that has non missing values you could try checking them one by one in order until you find one.

 

But if the datasets all have the same variables then you could just combine them in descending order by date and use a data step to find the first observation with any data.

 

So first put the list of datasets to search in order into a macro variable.  Then use that list in a SET statement with the INDSNAME= option.  When you find the first one where any of the variables are not missing then save its name into a macro variable and stop.  You can then use that macro variable to do something with the latest dataset.

proc sql noprint;
select dsname
     , nvar
    into :dslist separated by ' '
       , :nvar trimmed
  from candidates
  order by date descending
;
quit;

%let lastest_ds=;
data _null_;
  length dsname $41 ;
  set &dslist indsname=dsname;
  if cmiss(of _all_) < &nvar then do;
    call symputx('latest_ds',dsname);
    stop;
  end;
run;
Emma2021
Quartz | Level 8
Some variables (the same name), for example, age is numbering in one data and character in another data. So, the second part code does not working. Sorry
Tom
Super User Tom
Super User

@Emma2021 wrote:
Some variables (the same name), for example, age is numbering in one data and character in another data. So, the second part code does not working. Sorry

So you have other issues you should fix. 

 

Are they intended to have the same variables every time?  How did they get created?  Did you make the mistake of converting a CSV file into a dataset using PROC IMPORT instead of writing your own data step to read the CSV file?

 

If you cannot fix the datasets to be compatible then it will be more work to find the first one that has a non-empty observation.

 

You might need to resort to writing and running a SAS macro.  Perhaps something like:

%macro find_last(dslist);
%if not %symexist(last_dsname) %then %global last_dsname;
%local i next last_dsname;
%let last_dsname=;
%do i=1 %to %sysfunc(countw(&dslist,%str( )));
  %let next=%scan(&dslist,&i,%str( ));
  data _null_;
    set &next ;
    array _n _numeric_ _1 ;
    array _c $1 _character_ _2 ;
    if cmiss(of _n[*] _c[*]) < dim(_n)+dim(_c) then do;
      call symputx('last_dsname',"&next");
      stop;
    end;
   run;
   %if %length(&last_dsname) %then %goto quit;
%end;
%quit:
%mend;

Which you can then call with the list of dataset and it will find the first one in the list that has at least one non-empty observation.

Patrick
Opal | Level 21

Below sample code for how this could work if "empty table" means a table with no rows.

/* create sample data */
options dlcreatedir;
libname mylib "%sysfunc(pathname(work))/test";

data mylib.dsn_23jan2023  mylib.dsn_24jan2023 mylib.dsn_25jan2023 mylib.dsn_all;
  format data_date date9.;
  set sashelp.class;
  data_date='23jan2023'd; output mylib.dsn_23jan2023;
  data_date='24jan2023'd; output mylib.dsn_24jan2023;
run;

/* select latest table with rows */
%let latest_ds_with_data;
proc sql noprint;
  select cats(libname,'.',memname) into :latest_ds_with_data trimmed
  from dictionary.tables
  where 
    libname ='MYLIB' 
    and memname like "DSN^_%" escape '^'
    and not missing(input(scan(memname,-1,'_'),date9.))
    and nlobs>0
  having max(input(scan(memname,-1,'_'),date9.))=input(scan(memname,-1,'_'),date9.)
  ;
quit;

/* print latest table with rows */
proc print data=&latest_ds_with_data;
run;
s_lassen
Meteorite | Level 14

Here is a way to get a list of the non-empty datasets beginning with DSN_ (assuming your folder is allocated as MYLIB):

data data_ok(keep=dsname);                                                                                                              
  set mylib.dsn_: indsname=dsn;                                                                                                                 
  array _chars _character_;                                                                                                             
  length dsname $30;                                                                                                                    
  if cmiss(of _chars(*))<dim(_chars) or n(of _numeric_) then do;                                                                        
    if dsname ne dsn then do;                                                                                                           
      dsname=dsn;                                                                                                                       
      output;                                                                                                                           
      end;                                                                                                                              
    end;                                                                                                                                
  retain dsname;                                                                                                                        
run;                                                                                                                                    

What it does: reads through all the datasets, if there is an observation with non-missing values (and the dataset name has not been output before), it outputs the name of the dataset.

 

Counting non-missing characters is a bit roundabout, because there is no character function like the N() function for numerics, only the CMISS function to count missing character values.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 601 views
  • 7 likes
  • 6 in conversation