I used the following code.
proc sql;
create table fault_CNT_only as
select Unit, fault_code, count(*) as fault_count
from all_units_results
group by Unit, fault_code;
quit;
proc sql;
create table fault_CNT as
select A.Unit, A.fault_code, count(*) as fault_count,
B.sw_install_date
from all_units_results A left join InstallDate B
on A.Unit=B.vehicle_no
group by Unit, fault_code
;
quit;
The output of fault_CNT_only looks like:
8116 20-1803 1
8118 20-0715 1
8119 20-1801 2
8119 20-1802 2
8119 20-1803 2
8119 20-1804 3
8119 20-1806 2
8133 20-0794 1
while the output of fault_CNT looks like:
8116 20-1803 1 02JAN2015
8118 20-0715 1 01JAN2015
8119 20-1801 2 01JAN2015
8119 20-1801 2 01JAN2015
8119 20-1802 2 01JAN2015
8119 20-1802 2 01JAN2015
8119 20-1803 2 01JAN2015
8119 20-1803 2 01JAN2015
8119 20-1804 3 01JAN2015
8119 20-1804 3 01JAN2015
8119 20-1804 3 01JAN2015
8119 20-1806 2 01JAN2015
8119 20-1806 2 01JAN2015
8133 20-0794 1 01JAN2015
How can I get the same number of rows for fault_CNT as in fault_CNT_only?
Thanks for your help.
You have to remove your sw_install_date variable that you are brining in from the InstallDate dataset to get the same count. If you need the sw_install_date variable, then you should put it in the GROUP BY and before the count() function in your SELECT statement, although it doesn't mean that you will get the same count. If you have a Unit and Fault_Code combination with multiple sw_install_date values, then you will get more records than the original PROC SQL.
It looks like you'll need to aggregate/remove the date then somehow. How do you want the date information to be shown?
EDIT: Fixed spelling of word 'date'
You have to remove your sw_install_date variable that you are brining in from the InstallDate dataset to get the same count. If you need the sw_install_date variable, then you should put it in the GROUP BY and before the count() function in your SELECT statement, although it doesn't mean that you will get the same count. If you have a Unit and Fault_Code combination with multiple sw_install_date values, then you will get more records than the original PROC SQL.
Dear dcruik,
I'm not sure I explained the problem well enough. thank you for your original comment. I used your idea and it worked correctly. The code I used has a minor change in the select ordering from what you suggested.
proc sql;
create table fault_CNT as
select A.Unit, A.fault_code, count(*) as fault_count,
B.sw_install_date
from all_units_results A left join InstallDate B
on A.Unit=B.vehicle_no
group by Unit, fault_code, B.sw_install_date
;
quit;
The final output has no duplicates and the same number of rows as fault_CNT_only.
8116 20-1803 1 02JAN2015
8118 20-0715 1 01JAN2015
8119 20-1801 2 01JAN2015
8119 20-1802 2 01JAN2015
8119 20-1803 2 01JAN2015
8119 20-1804 3 01JAN2015
8119 20-1806 2 01JAN2015
8133 20-0794 1 01JAN2015
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.