BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
zoomzoom
Obsidian | Level 7

I had a problem before, and I got a solution to that. The solution was working fine, and now there is some more problem.

 

Note: I am opening a new thread on the old with a version number, as I marked the last problem as 'solved' .

 

Original Problem:

 

I have some 40 variables, on which I have to perform 10 different data quality (Null, zero value, length, validity of date..) checks. And I am generating one table for each  1Var * 1 DQ check, resulting in 100 tables. Each table will hold just a control total, i.e. count of distinct value for a variable.

Proc sql;
create table DQ_Check_Zero_Var1 as
select count(distinct Var1) as Count_Zero from Source_table;
quit;


Now, I would like to create a master table in this format. Each cell will hold the only value from the corresponding table.


    Stat                     Var1               Var2               Var3             Var4               Var5              ...
 Check_Zero         (number)        (number)       (number)     (number)      (number)

Check_Null

    .

    .

    .

    .

 

To this, @PGStats has provided an appropriate solution. Below is the entire solution he had provided and the original link to the post.

 (Link - https://communities.sas.com/t5/General-SAS-Programming/Building-master-table-with-input-from-multipl...

 

Solution:

I assumed that your datasets have names like DQ_Check_Stat_Var where Stat is the name of the statistic and Var is the name of the variable

 

Proc sql;
create table DQ_Check_N_height as
select count(height) as Count_Not_Null from SasHelp.class;
quit;

Proc sql;
create table DQ_Check_N_weight as
select count(weight) as Count_Not_Null from SasHelp.class;
quit;

Proc sql;
create table DQ_Check_Range_height as
select range(height) as Range from SasHelp.class;
quit;

Proc sql;
create table DQ_Check_Range_weight as
select range(weight) as Range from SasHelp.class;
quit;

data all;
set DQ_Check_: INDSNAME=ds;
length var Stat $32;
var = propcase(scan(ds, -1, "_"));
stat = propcase(scan(ds, -2, "_"));
value = coalesce(of _numeric_);
keep var stat value;
run;

proc sort data=all; by stat var; run;

proc transpose data=all out=want( drop=_: );
by stat;
id var;
var value;
run;

proc print data=want noobs; run;

 

 

 

New Problem and Error:

 

My code looks like:

 

Data DQ_Stat_Var(n)_Pre; 

Set Source_Table;

<conditons...>

run;

 

(Note: Since variable names are quite long, to keep track which var I am working on, I have taken first 6 characters from variable names and attached each with a sequence number generated with _n_ option. There is no underscore between Var and number. I also made sure that no generated dataset names are more 32 characters)

 

Proc sql;
create table DQ_Stat_Var(n) as
select count(distinct Var) as Cnt from DQ_Stat_Var(n)_Pre;
quit;

 

After all the datasets are generated, I use this proposed code:

    data all;
       set DQ_Check_: INDSNAME=ds;
       length var Stat $32;
       var = propcase(scan(ds, -1, "_"));
       stat = propcase(scan(ds, -2, "_"));
       value = coalesce(of _numeric_);
       keep var stat value;
   run;

 

And now I get this error:

 

Data All;

Set libname.DQ_:INDSNAME=ds;

ERROR: Variable BRANCH_NUMBER has been defined as both character and numeric.

ERROR: Variable BRANCH_NUMBER has been defined as both character and numeric.

ERROR: Variable BRANCH_NUMBER has been defined as both character and numeric.

  length var Stat $32;
       var = propcase(scan(ds, -1, "_"));
       stat = propcase(scan(ds, -2, "_"));
       value = coalesce(of _numeric_);
       keep var stat value;
   run;

 

Note: I do not have any variable called Branch_Number. 

 

Thank you all in advance.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Agree with @LinusH

 

Suggest you execute your code up to the data step where you get the Error and then run the following:

proc sql;
  create table find_it as
  select *
  from dictionary.columns
  where 
    libname='<libref>' 
    and memname like 'DQ^_%' escape '^'
    and upcase(name)='BRANCH_NUMBER '
  ;
quit;

  

View solution in original post

9 REPLIES 9
LinusH
Tourmaline | Level 20
First: great description of your situation!

I haven't gone through your code in detail, and it's hard to see what's going on without source data. But BRANCH_NUMBER is comming from somewhere, you need to find out from where by looking through all your temporary tables.
I think that you now are paying the price for working with wide data - it drives syntax more than narrow/long table designs.
Data never sleeps
Patrick
Opal | Level 21

Agree with @LinusH

 

Suggest you execute your code up to the data step where you get the Error and then run the following:

proc sql;
  create table find_it as
  select *
  from dictionary.columns
  where 
    libname='<libref>' 
    and memname like 'DQ^_%' escape '^'
    and upcase(name)='BRANCH_NUMBER '
  ;
quit;

  
zoomzoom
Obsidian | Level 7

Thank you @Patrick and @LinusH

 

Using your code, I have found the variable BRANCH_NUMBER. It is in two source tables in two different data types, but it is not in the list of variables I am pulling in for data quality check.

Now, I have just two types of output dataset name, i.e. DQ_Stat_Var_Pre and DQ_Stat_Var.

 

Data DQ_Stat_Var_Pre;

Set Source_Table;

<conditions>;

run;

 

Proc Sql Noprint;

Create table DQ_Stat_Var as

  Select Count(Distinct Var) As Cnt from DQ_Stat_Var_Pre;

Quit;

 

The variable BRANH_NUMBER only appears in some of the output _Pre tables. Previously when the code was working without error with a smaller subset of variables, I was dropping records when Stat='Pre'. Now, I cannot even run this commented piece of code, as I get the same error.

 

Data All;

Set libname.DQ_:INDSNAME=ds;

/*

Length var stat $32;

var=propcase(scan(ds, -1, "_"));

stat=propcase(scan(ds, -1, "_"));

value=coalesce(of _numeric_);

keep var stat value;

*/

run;

 

I can certainly drop the variable BRANCH_NUMBER while pulling data from the source tables. The entire code runs for quite some time, and I have to start from the begining.  Is there another way to counter the error with a similar type of INDSNAME operation?

 

Thank you for your valuable input.

PGStats
Opal | Level 21

If BRANCH_NUMBER is not required, I would try

 

Set libname.DQ_: (drop=branch_number)  INDSNAME=ds;

PG
zoomzoom
Obsidian | Level 7

Thank you @PGStats.

 

This is what happens now:

 

Data All;

Set libname.DQ_: (drop=branch_number) INDSNAME=ds;

ERROR: The variable BRANCH_NUMBER in the DROP, KEEP or RENAME list has never been referenced.

 

<rest of the code>

 

And the original error remains when I remove the Drop option.

PGStats
Opal | Level 21

If I get it right, you don't want to read in the _pre datasets when creating dataset all. So, create your summary datasets with a different prefix and read only those when creating all

PG
zoomzoom
Obsidian | Level 7
Sure, I can do that. But before I do that, as that is quite time consuming, is there any other way to fetch the data for non _Pre datasets? I have made a temporary table with the values from dictionary.columns, and I have dropped the values containing _Pre. Is it possible to use this temp table to fetch the values from the summary tables?
Patrick
Opal | Level 21

Citing @LinusH 

"I think that you now are paying the price for working with wide data - it drives syntax more than narrow/long table designs."

 

It can happen that we take a wrong turn and it's sometimes worth to just re-start instead of trying to continue on the path we've taken. Normally when you re-think the problem with an open mind and all the experience already gathered easier solution arise.

I feel you're at the point where you should re-think and re-start.

 

I haven't understood in full yet what you're trying to achieve. I believe it would help if you could describe as clear as possible what you have and what you need - and why.

Please provide some sample data (a data step creating such data) for the have and then describe what you need (eventually also providing some data for illustration purposes). Let us also know the source data volumes you're dealing with.

 

zoomzoom
Obsidian | Level 7
Thank you, @Patrick.

I have changed the dataset names and it is working fine now.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1329 views
  • 0 likes
  • 4 in conversation