Hi,
I work on a project that gets data from different sources and provides a dataset of new cases on a daily bases. I import data from different excel sheets into SAS where I merge them and then clean and deduplicate them and create a dataset of new cases only.
First, what function can I use to create the dataset? Also, is there a way to automate the creation of this dataset?
Here's the SAS program that I use to create the daily data so far:
%macro data(sheet, name);
proc import out= &name replace
datafile = "R:\Hospital-Outreach\Patient Safety Information Exchange\PSIE_Data\PSIEDataProcess\PSIE_DataToSAS.xlsx"
dbms = Excel;
sheet = "&sheet";
getnames = yes;
run;
%mend data;
%data(inc, inc);
%data(efc, efc);
%data(lab, lab);
%data(NHSN, NHSN);
%data(LiveList, LiveList);
proc contents data = efc; run;
%MACRO MSORT1(DATASET,VAR);
PROC SORT DATA=&DATASET;
BY &VAR;
RUN;
%MEND MSORT1;
%MACRO MMERGE(DEST,SOURCE1,SOURCE2,VAR);
%msort1(&source1, &var);
%msort1(&source2, &var);
DATA &DEST;
update &SOURCE1 &SOURCE2;
BY &VAR;
RUN;
%MEND MMERGE;
%MMERGE (WORK.labinc, WORK.Inc, WORK.lab, incident_ID);
%MMERGE (WORK.IRIS, WORK.labinc, WORK.efc, person_ID);
%MMERGE (WORK.All, WORK.IRIS, WORK.NHSN, Full_Name);
/*Data deduplication for null addresses*/
proc sql;
create table Alldup as
select *
from All
group by full_name
having Address1=max(address1)
order by full_name, person_id, DOB;
quit;
Thank you
Datasets are created with a DATA step, the CREATE TABLE statement in SQL, or the OUT= option or OUTPUT statement in certain procedures.
Automation of SAS programs is best done by saving them to a .sas file to run the code in batch mode, and have this executed by your preferred scheduler.
Datasets are created with a DATA step, the CREATE TABLE statement in SQL, or the OUT= option or OUTPUT statement in certain procedures.
Automation of SAS programs is best done by saving them to a .sas file to run the code in batch mode, and have this executed by your preferred scheduler.
Unless your data sources are extremely consistent then reliance on Proc Import to read XLSX files will eventually have a problem with combining data. Proc Import has to make a bunch of GUESSES based on the content of each file independently of other files. So variable names, data types and lengths of variables can change from file to file.
If the files are supposed to be of the same structure you might be better off using the LIBNAME to associate your XLSX file and then parsing the data in the sheets to a standard form.
I am also a bit concerned about your max(address1). If your "address1" is a character variable then max may not be returning what you think it is. From some experience I would not expect any address field to ever be numeric as even things that are supposed to be street numbers end up with bits like "Apt A", "123 1/2" and such.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.