BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
mayasak
Quartz | Level 8

Hi,

 

I work on a project that gets data from different sources and provides a dataset of new cases on a daily bases. I import data from different excel sheets into SAS where I merge them and then clean and deduplicate them and create a dataset of new cases only.

First, what function can I use to create the dataset? Also, is there a way to automate the creation of this dataset?

Here's the SAS program that I use to create the daily data so far:

 

%macro data(sheet, name);
proc import out= &name replace
datafile = "R:\Hospital-Outreach\Patient Safety Information Exchange\PSIE_Data\PSIEDataProcess\PSIE_DataToSAS.xlsx"
dbms = Excel;
sheet = "&sheet";
getnames = yes;
run;
%mend data;


%data(inc, inc);
%data(efc, efc);
%data(lab, lab);
%data(NHSN, NHSN);
%data(LiveList, LiveList);
proc contents data = efc; run;

%MACRO MSORT1(DATASET,VAR);
PROC SORT DATA=&DATASET;
BY &VAR;
RUN;
%MEND MSORT1;


%MACRO MMERGE(DEST,SOURCE1,SOURCE2,VAR);

%msort1(&source1, &var);
%msort1(&source2, &var);

DATA &DEST;
update &SOURCE1 &SOURCE2;
BY &VAR;
RUN;
%MEND MMERGE;
%MMERGE (WORK.labinc, WORK.Inc, WORK.lab, incident_ID);
%MMERGE (WORK.IRIS, WORK.labinc, WORK.efc, person_ID);
%MMERGE (WORK.All, WORK.IRIS, WORK.NHSN, Full_Name);

/*Data deduplication for null addresses*/


proc sql;
create table Alldup as
select *
from All
group by full_name
having Address1=max(address1)
order by full_name, person_id, DOB;
quit;

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Datasets are created with a DATA step, the CREATE TABLE statement in SQL, or the OUT= option or OUTPUT statement in certain procedures.

Automation of SAS programs is best done by saving them to a .sas file to run the code in batch mode, and have this executed by your preferred scheduler.

View solution in original post

2 REPLIES 2
Kurt_Bremser
Super User

Datasets are created with a DATA step, the CREATE TABLE statement in SQL, or the OUT= option or OUTPUT statement in certain procedures.

Automation of SAS programs is best done by saving them to a .sas file to run the code in batch mode, and have this executed by your preferred scheduler.

ballardw
Super User

Unless your data sources are extremely consistent then reliance on Proc Import to read XLSX files will eventually have a problem with combining data. Proc Import has to make a bunch of GUESSES based on the content of each file independently of other files. So variable names, data types and lengths of variables can change from file to file.

 

If the files are supposed to be of the same structure you might be better off using the LIBNAME to associate your XLSX file and then parsing the data in the sheets to a standard form.

 

I am also a bit concerned about your max(address1). If your "address1" is a character variable then max may not be returning what you think it is. From some experience I would not expect any address field to ever be numeric as even things that are supposed to be street numbers end up with bits like "Apt A", "123 1/2" and such.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 853 views
  • 0 likes
  • 3 in conversation