I used the following code.
proc sql;
create table fault_CNT_only as
select Unit, fault_code, count(*) as fault_count
from all_units_results
group by Unit, fault_code;
quit;
proc sql;
create table fault_CNT as
select A.Unit, A.fault_code, count(*) as fault_count,
B.sw_install_date
from all_units_results A left join InstallDate B
on A.Unit=B.vehicle_no
group by Unit, fault_code
;
quit;
The output of fault_CNT_only looks like:
8116 20-1803 1
8118 20-0715 1
8119 20-1801 2
8119 20-1802 2
8119 20-1803 2
8119 20-1804 3
8119 20-1806 2
8133 20-0794 1
while the output of fault_CNT looks like:
8116 20-1803 1 02JAN2015
8118 20-0715 1 01JAN2015
8119 20-1801 2 01JAN2015
8119 20-1801 2 01JAN2015
8119 20-1802 2 01JAN2015
8119 20-1802 2 01JAN2015
8119 20-1803 2 01JAN2015
8119 20-1803 2 01JAN2015
8119 20-1804 3 01JAN2015
8119 20-1804 3 01JAN2015
8119 20-1804 3 01JAN2015
8119 20-1806 2 01JAN2015
8119 20-1806 2 01JAN2015
8133 20-0794 1 01JAN2015
How can I get the same number of rows for fault_CNT as in fault_CNT_only?
Thanks for your help.
You have to remove your sw_install_date variable that you are brining in from the InstallDate dataset to get the same count. If you need the sw_install_date variable, then you should put it in the GROUP BY and before the count() function in your SELECT statement, although it doesn't mean that you will get the same count. If you have a Unit and Fault_Code combination with multiple sw_install_date values, then you will get more records than the original PROC SQL.
It looks like you'll need to aggregate/remove the date then somehow. How do you want the date information to be shown?
EDIT: Fixed spelling of word 'date'
You have to remove your sw_install_date variable that you are brining in from the InstallDate dataset to get the same count. If you need the sw_install_date variable, then you should put it in the GROUP BY and before the count() function in your SELECT statement, although it doesn't mean that you will get the same count. If you have a Unit and Fault_Code combination with multiple sw_install_date values, then you will get more records than the original PROC SQL.
Dear dcruik,
I'm not sure I explained the problem well enough. thank you for your original comment. I used your idea and it worked correctly. The code I used has a minor change in the select ordering from what you suggested.
proc sql;
create table fault_CNT as
select A.Unit, A.fault_code, count(*) as fault_count,
B.sw_install_date
from all_units_results A left join InstallDate B
on A.Unit=B.vehicle_no
group by Unit, fault_code, B.sw_install_date
;
quit;
The final output has no duplicates and the same number of rows as fault_CNT_only.
8116 20-1803 1 02JAN2015
8118 20-0715 1 01JAN2015
8119 20-1801 2 01JAN2015
8119 20-1802 2 01JAN2015
8119 20-1803 2 01JAN2015
8119 20-1804 3 01JAN2015
8119 20-1806 2 01JAN2015
8133 20-0794 1 01JAN2015
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.