BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Tom
Super User Tom
Super User

@2222 wrote:
Someone else has written code to remove the values (these are old values that are no longer valid), i want to check to make sure that the values have indeed been removed.

What does that mean? 

Terminology clarification.  SAS datasets consist of OBSERVATIONS (what you might call a row in a spreadsheet or a printed reported) and VARIABLES (what you might call a column in a spreadsheet or a printed report).

 

 

Do you mean they changed the values of some the variables to something else?  What did they change them to?
Do you mean they deleted the OBSERVATIONS that had an invalid value for at least one of the VARIABLES?

 

How do you know which values are invalid?  What makes them invalid?

 

You mentioned before about comparing the "original" and the "clean" dataset. If you have a lot of datasets how to know which ones to compare to each other?  Is there some pattern in the NAME of the datasets?

 

 

2222
Calcite | Level 5
Hi Tom, thanks for replying. Kurt is onto it i think. Basically there are datasets that have observations that were outdated (all were outdated) and someone deleted them all by hand. My job is to check, that for each specified variable within a dataset there should are no observations (make sure they have indeed all been removed). I made a mistake by saying i wanted to compare them and Kurt helped to clarify the wording of what we actually want to do, there is no need to compare the datasets, we just want to check the datasets have been "cleaned" ie had all the observations for the variables in question, removed. We still need to keep the variables though because the structure of the dataset needs to remain the same.
Kurt_Bremser
Super User

Please use correct diction. A variable cannot "have observations".

A dataset consists of observations (records or rows), which contain variables (columns), and the variables contain values.

As long as there are observations in a dataset, all variables defined will have values. These values may be missing or non-missing. You cannot have a "variable with no observations", unless the dataset is empty (no observations at all).

 

In light of this, restate your issue, using the correct terms.

2222
Calcite | Level 5
sorry, i thought Tom was correcting me (above) and i changed the terminology to match his correction, im not very good at this stuff and sorry for the confusion my communication skills are causing.
the two comments you left:
1. So you do not want to remove variables, but values??
2. So you need to check for the given variables that the count of non-missing values is zero?
Were both correct and that's what i've tried to do with proc sql and count steps, but as the number of variables to check increases i think there must be a better/faster way to do this?
fja
Lapis Lazuli | Level 10 fja
Lapis Lazuli | Level 10

@2222 wrote:
Someone else has written code to remove the values (these are old values that are no longer valid), i want to check to make sure that the values have indeed been removed.

As far as I understood you the situation is: Somebody has already cleaned your data by deleting the contents of certain columns in a certain subset of rows. And now you would like to check if your tables are really clean.

Maybe you could post that code here to give us an idea of the action previously taken and the resulting table. ... and maybe you add a few lines of code to add an example table to let us reproduce the status of your data.

--fja

fja
Lapis Lazuli | Level 10 fja
Lapis Lazuli | Level 10
I do not fully understand your problem.
1. Do you want to query a specific table/dataset or are you after a generic solution (i.e. applicable to any dataset)?
2. What does "variable has been cleared" mean? Would you like to check a table per row if a certain variable and its value?
3. ... and following #2: What does "cleared" mean. Set to something or containing "." (i.e. is missing)?
--fja
2222
Calcite | Level 5
hello fja, thank you. cleared means the values have been deleted/removed, the variable, as a column header still remains, but i need to make sure/check that the values for the variable are all gone/have been removed. some values are string, some are numbers. there are multiple datasets that i need to check
fja
Lapis Lazuli | Level 10 fja
Lapis Lazuli | Level 10

Understood ... see my answer above ... maybe you could provide an example.

2222
Calcite | Level 5
hello fja, thank you below is what im currently doing for 1 dataset, it works but im thinking its bad code:

libname mylib "\folder path";

title "a1"; *add a title for each dataset so that the results table is clear;
proc sql;
select count(a) as a,
count(b) as b,
count(c) as c,
count(d) as d,
count(e) as e,
count(f) as f,
count(g) as g,
count(h) as h,
count(i) as i,
count(j) as j,
count(k) as k,
count(l) as l,
count(m) as m,
count(n) as n,
count(o) as o,
count(p) as p
from mylib.a1_cleaned;
quit;
fja
Lapis Lazuli | Level 10 fja
Lapis Lazuli | Level 10

Is the number, name and type of columns fixed ?
Are 0 or "" valid entries, i.e. would be considered _not_ deleted?

2222
Calcite | Level 5
no the number of variables in each dataset are different. some share the same variables but it changes in others. each "cell" in the dataset that we want cleaned should have nothing in it so 0 is no good and niether is "" as you said, they would be considered not deleted
Patrick
Opal | Level 21

Posting some code and/or sample data for Have and Want really helps to understand what you want. 

Using your code as starting point below how you could get the list of variables dynamically.

data have;
  infile datalines truncover dsd;
  input varA varB $ varC;
  datalines;
.,X,.
.,X,2
.,X,.
., ,4
;

%macro no_miss_cnt(ds=);
  %let ds=%upcase(&ds);
  %local cnt_list;
  proc sql noprint;
    select cats('count(',name,') as')||' '||name into :cnt_list separated by ','
    from dictionary.columns
    where 
      libname="%scan(WORK.&DS,-2,.)" and
      memname="%scan(&DS,-1,.)"
    ;
  quit;

  proc sql;
    select 
      count(*) as __n_rows label='N Observations',
      &cnt_list
    from &ds
    ;
  quit;
%mend;
%no_miss_cnt(ds=work.have);

Patrick_0-1672919447373.png

 

2222
Calcite | Level 5
thank you Patrick, the code i posted with the counts does work and does give the desired outcome but i have to repeat the code for each dataset and add/remove variables manually, which i am hoping to avoid because some of the datasets are really huge with many variables. i will try your code and thank you for posting and you help
Tom
Super User Tom
Super User

So you want the number of non-missing values for each variable?

The syntax is much easier if you use SAS code instead of SQL code.

For example to count the number of non-missing values for all of the numeric variables in a dataset just use PROC SUMMARY.  

proc summary data= mylib.a1_cleaned ;
  var _numeric_;
  output out=want n= ;
run;

Or you could use code generation.  So first get a list of the variables in the dataset.  For example you could use PROC TRANSPOSE to do that.

proc transpose data= mylib.a1_cleaned(obs=0) out=names; run;

And once you have the names you could use that to generate the variable list in your SQL select statement. 

If the number of variable is small then you could use a single macro variable to store the code:

proc sql;
reset noprint;
select catx(' ','count(',nliteral(_name_),') as',nliteral(_name_))
  into :list separated by ','
  from names
;
reset print;
select &list from mylib.a1_cleaned;
quit;

If it is larger then use a data step to write the code to a file and then %INCLUDE the file.

filename code temp;
data _null_;
  set names end=eof ;
  file code;
  if _n_=1 then put 'select' / ' ' @;
  else put ',' @ ;
  length nliteral $60 ;
  nliteral=nliteral(_name_);
  put 'count(' nliteral ') as ' nliteral ;
  if eof then put 'from mylib.a1_cleaned;';
run;
proc sql;
%include code / source2;
quit;
2222
Calcite | Level 5
hi Tom, yes, the number of non-missing values for each variable. will you code (and thank you for posting!) handle the character variables and variables that have string entries?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 35 replies
  • 2914 views
  • 3 likes
  • 7 in conversation