Good afternoon !
Currently got stuck with Proc SQL code. When I use group by operation, some missing rows are spoiling the output result.
According to SQL logic, I should use aggregation function (e.g. sum/avg) to collapse rows, but there is no aggregation function to exclude missing row from a group.
Example:
Input data
| ID | Customer | Date | Sum | Field |
| ID1 | C1 | 05.05.2021 | 15 | AAA |
| ID2 | C1 | 05.05.2021 | 16 | |
| ID3 | C1 | 05.05.2021 | 21 | AAA |
| ID4 | C2 | 21.05.2021 | 30 | BBB |
I use this code:
proc sql noprint;
create table output as
select Customer, Date, Field
,sum(SUM) as Sum
,count(*) as count
from work.data_test
group by Customer, Date, Field
;quit;And get this:
| Customer | Date | SUM | Field | count |
| C1 | 05.05.2021 | 36 | AAA | 2 |
| C1 | 05.05.2021 | 16 | 1 | |
| C2 | 21.05.2021 | 30 | BBB | 1 |
Instead I need to get this:
| Customer | Date | Sum | Field | count |
| C1 | 05.05.2021 | 52 | AAA | 3 |
| C2 | 21.05.2021 | 30 | BBB | 1 |
Is there any possible solution ?
Thanks
The missing value for FIELD is of course a valid level, just like 'AAA' or 'BBB'.
If you don't want this, you have to impute the missing values.
You can for example do a Last Observation Carry Forward (LOCF = Forward Filling) to impute the missing value with a non-missing level.
Regards,
Koen
What is the logic that says the missing FIELD should be replace by AAA in your example?
Why not include it as FIELD='BBB' or 'CCC'?
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.