BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Elle
Quartz | Level 8

Hello,

 

I have an input dataset with 4 important variables: ID, A1, A2, A3.

After processing it, I've split it into 5 datasets: ERR1, ERR2, ERR3, MODEL, NEWINPUT, with the same inputs.

I need to validate the results so I'm trying to do a query to see if:

SUM(input.A1) is equal to  SUM( SUM(ERR1.A1), SUM(ERR2.A1) ....., SUM(NEWINPUT.A1) ) 

Also I need to group it by ID.

My output should look like this:

 

ID        INPUT_A1             OUTPUT_A1                                  INPUT_A2  OUTPUT_A2  INPUT_A3  OUTPUT_A3          
1        SUM(input.A1)         SUM( SUM(ERR1.A1), SUM(ERR2.A1) .....)       ...        ...        ...       ...
 

 

 

My current code, which is not working, looks like this:

proc sql;
	select coalesce(a.Id, b.Id, c.Id, d.Id, e.Id, f.Id)  as ID,
		   sum(a.A1) as IN_A1, 
		   sum( sum(b.A1), sum(c.A1), sum(d.A1), sum(e.A1), sum(f.A1) ) as OUT_A1,
		   sum(a.A2) as IN_A2, 
		   sum( sum(b.A2), sum(c.A2), sum(d.A2), sum(e.A2), sum(f.A2) ) as OUT_A2,
		   sum(a.A3) as IN_A3,
		   sum( sum(b.A3), sum(c.A3), sum(d.A3), sum(e.A3), sum(f.A3) ) as OUT_A3
	from Input as a, 
		 err.err1 as b,
		 err.err2 as c,
		 err.err3 as d,
		 err.model as e,
		 err.newinput as f
	where a.Id=b.Id=c.Id=d.Id=e.Id=f.Id
	group by calculated ID;
quit;

Any help?

 

1 ACCEPTED SOLUTION

Accepted Solutions
LinusH
Tourmaline | Level 20
Hi
I think it's better if you add information in a separate post (not thread!), it's easier to follow the dialogue.

You could create empty template data sets which you append your results to, ensuring you consistent layout.
Another option is to make this logic into a macro, will let you set parameters and macro logic can do checks on the input data.

But, why do you need this at all, doesn't sound like a one time thing. Don't you trust the logic in the preceeding steps?
Data never sleeps

View solution in original post

6 REPLIES 6
LinusH
Tourmaline | Level 20

What is not working...?

First, since you are using inner join, no need to do the coalesce() and the calculated thing.

Second, what does uniquely identify an observation in the tables?

If it isn't ID, you will end up with M-M in your result set.

If it,s ID, no need for group by.

Sample data always help input and output.

 

Data never sleeps
Elle
Quartz | Level 8

UPDATE1:

The INPUT dataset contains information about customer purchases.

ERR1 - is supposed to contain certain rows from the INPUT that fall under ERROR1 (EMPTY at the moment)

ERR2 - rows from INPUT that fall under ERROR2 (EMPTY at the moment)

ERR3 - rows from INPUT that fall under ERROR3 (Has about 10 rows)

MODEL - rows from INPUT that are VALID (Has the bulk of the data)

NEWINPUT - rows from INPUT that will be taken to the next time period where extra data is added and at one point it will become the new input file for the exact same analysis (contains about 5 rows at this point)

All of these tables are mutually exclusive and have the same structure so it would make sense to validate if the split is done correctly by simply verifying the sums of the columns for each variable:

 

VAR A: SUM(A) for INPUT = SUM (SUM(A) for ERR1, SUM(A) for ERR2, SUM(A) for ERR3, SUM(A) for MODEL, SUM(A) for NEWINPUT)

 

But all I get with my query is a table with the right header and missing values for each column

Hope it's more clear. Cannot share data as it's sensitive.

 

PS: ID is unique key

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Why split the dataset into 5 separate datasets in the first place?  As @LinusH has mentioned, provide example test data (form of a datastep) and what the output should look like.  I would imagine there are better techniques, using by group for instance.

Elle
Quartz | Level 8

UPDATE2:

I've identified the problem: SUM function doesn't operate on empty datasets. Something with its argument needing to be numeric.

 

Does anyone know a way around this? The 5 datasets are quite interactive and, depending on the INPUT data, each one of them can be empty at any time.

LinusH
Tourmaline | Level 20
Hi
I think it's better if you add information in a separate post (not thread!), it's easier to follow the dialogue.

You could create empty template data sets which you append your results to, ensuring you consistent layout.
Another option is to make this logic into a macro, will let you set parameters and macro logic can do checks on the input data.

But, why do you need this at all, doesn't sound like a one time thing. Don't you trust the logic in the preceeding steps?
Data never sleeps
Elle
Quartz | Level 8

The 5 tables are delivered to my supervisor that wants to validate my work.

I'm validating it beforehand to make sure he doesn't get any strange results.

 

Thank you for the input. I think I'll use macro variables to check the sums.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 6043 views
  • 1 like
  • 3 in conversation