Hello,
This is my first post here, so I'll try to be as clear as possible (I'm currently using SAS OnDemand for Academics).
I have to generate a listing of queries monthly, and use the previous generated queries sas file (if it exists) to update the new one and avoid sending a query which has already been answered.
What I want to update in the sas file would be the date and the status of the newly generated queries.
Here is a short example of the sas file structure, and how it should looks like at the end :
This is the first iteration, the generated table always has the Ongoing status.
data table1;
infile cards dsd dlm=',' truncover ;
length record_id $7 Center $2 Event_name $20 Repeat_instance Form $20 Query $50 Date_creation Editcheck_ID $20 Status $50;
input record_id $ Center $ Event_name $ Repeat_instance Form $ Query $ Date_creation :yymmdd10. Editcheck_ID $ Status $;
cards;
01-001, 01, Procedure, ., Procedure, Incorrect date, 2024-08-06, procdt_A, Ongoing
01-001, 01, Procedure, ., Treatment, Missing data, 2024-08-06, trtname_A, Ongoing
01-001, 01, Procedure, 1, AE, Discrepancy in AE date, 2024-08-06, aedt_A, Ongoing
02-015, 02, Visit 1, ., Treatment, Missing data, 2024-08-06, trtname_A, Ongoing
02-015, 02, Visit 1, 3, AE, AE is not declared, 2024-08-06, aename_A, Ongoing
02-003, 02, Procedure, ., Treatment, Out of bounds data, 2024-08-06, trtdose_A, Ongoing
;
run;
I send the previous table in a Excel file to each center, and I receive it with the status updated a month later.
data table1_returned;
infile cards dsd dlm=',' truncover ;
length record_id $7 Center $2 Event_name $20 Repeat_instance Form $20 Query $50 Date_creation Editcheck_ID $20 Status $50;
input record_id $ Center $ Event_name $ Repeat_instance Form $ Query $ Date_creation :yymmdd10. Editcheck_ID $ Status $;
format Date_creation yymmdd10.;
cards;
01-001, 01, Procedure, ., Procedure, Incorrect date, 2024-08-06, procdt_A, Ongoing
01-001, 01, Procedure, ., Treatment, Missing data, 2024-08-06, trtname_A, Closed - value corrected
01-001, 01, Procedure, 1, AE, Discrepancy in AE date, 2024-08-06, aedt_A, Closed - value corrected
02-015, 02, Visit 1, ., Treatment, Missing data, 2024-08-06, trtname_A, Ongoing
02-015, 02, Visit 1, 3, AE, AE is not declared, 2024-08-06, aename_A, Closed - value corrected
02-003, 02, Procedure, ., Treatment, Out of bounds data, 2024-08-06, trtdose_A, Closed - value not corrected
;
run;
This is the second iteration, some older queries appear again.
data table2;
infile cards dsd dlm=',' truncover ;
length record_id $7 Center $2 Event_name $20 Repeat_instance Form $20 Query $50 Date_creation Editcheck_ID $20 Status $50;
input record_id $ Center $ Event_name $ Repeat_instance Form $ Query $ Date_creation :yymmdd10. Editcheck_ID $ Status $;
cards;
05-002, 05, Inclusion, ., Inclusion, Incorrect date, 2024-09-13, incdt_A, Ongoing
01-001, 01, Procedure, ., Procedure, Incorrect date, 2024-09-13, procdt_A, Ongoing
05-002, 05, Inclusion, ., Medical history, Out of bounds data, 2024-09-13, mhiketdose_A, Ongoing
06-007, 06, Inclusion, ., Demography, Discrepancy in Birth date, 2024-09-13, dembddt_A, Ongoing
07-015, 07, Visit 3, ., Treatment, Missing data, 2024-09-13, trtname_A, Ongoing
02-015, 02, Visit 1, ., Treatment, Missing data, 2024-09-13, trtname_A, Ongoing
02-003, 02, Procedure, ., Treatment, Out of bounds data, 2024-09-13, trtdose_A, Ongoing
;
run;
And this is where I'm stuck, on how to update the new dataset by comparing it to the previous one. Since each real dataset can be a thousand lines long, I don't think a proc compare is the best.
If the queries already exists and has been answered (ex : out of bounds data) it should be deleted, if it exists and has not been answered, the Date_creation should be updated.
data table2_final;
infile cards dsd dlm=',' truncover ;
length record_id $7 Center $2 Event_name $20 Repeat_instance Form $20 Query $50 Date_creation Editcheck_ID $20 Status $50;
input record_id $ Center $ Event_name $ Repeat_instance Form $ Query $ Date_creation :yymmdd10. Editcheck_ID $ Status $;
cards;
05-002, 05, Inclusion, ., Inclusion, Incorrect date, 2024-09-13, incdt_A, Ongoing
01-001, 01, Procedure, ., Procedure, Incorrect date, 2024-08-06, procdt_A, Ongoing
05-002, 05, Inclusion, ., Medical history, Out of bounds data, 2024-09-13, mhiketdose_A, Ongoing
06-007, 06, Inclusion, ., Demography, Discrepancy in Inclusion date, 2024-09-13, dembddt_A, Ongoing
07-015, 07, Visit 3, ., Treatment, Missing data, 2024-09-13, trtname_A, Ongoing
02-015, 02, Visit 1, ., Treatment, Missing data, 2024-08-06, trtname_A, Ongoing
;
run;
Thank you for your answers !
Your example code is creating Date_creation as a character variable length 20 (and Repeat_instance). Either assign a length of 8 (Not $ 😎 to numeric variables or don't place them on a length statement. One suspects was the result of a quicky approach to keep variables in specific column order. But this is causing errors because the character values of Creatio_date cannot use the numeric informat YYMMDD10.
See if this handles the Table1 and Table1_returned as desired.
proc sort data=table1; by record_id Event_name Repeat_instance Form Query Editcheck_ID; run; proc sort data=table1_returned; by record_id Event_name Repeat_instance Form Query Editcheck_ID; run; data example; update table1 (in=inbase) table1_returned (in=inreturn drop=Date_creation) ; by record_id Event_name Repeat_instance Form Query Editcheck_ID; if inreturn and status=:"Closed" then delete; /* what date would be assigned???*/ run;
Really need some rules about what "creation_date" would be changed to. You didn't provide any guideline.
This update might require dropping the values of Creation_date from the Returned table, otherwise matches not dropped would have the creation_date updated to the value in the returned data set. If your returned data has more variables than shown then on the Update statement only the BY variables plus the Status should be kept as dataset options for the returned data.
What combination of fields (if any) form a unique identifier for your records?
Your answer made me realize my example was incomplete, so I updated the tables I used with the Editcheck_ID variable (which is the variable name associated to the discrepancy).
The unique identifier would be a combination of record_id+Event_name+Repeat_instance+Form+Query+Editcheck_ID
Your example code is creating Date_creation as a character variable length 20 (and Repeat_instance). Either assign a length of 8 (Not $ 😎 to numeric variables or don't place them on a length statement. One suspects was the result of a quicky approach to keep variables in specific column order. But this is causing errors because the character values of Creatio_date cannot use the numeric informat YYMMDD10.
See if this handles the Table1 and Table1_returned as desired.
proc sort data=table1; by record_id Event_name Repeat_instance Form Query Editcheck_ID; run; proc sort data=table1_returned; by record_id Event_name Repeat_instance Form Query Editcheck_ID; run; data example; update table1 (in=inbase) table1_returned (in=inreturn drop=Date_creation) ; by record_id Event_name Repeat_instance Form Query Editcheck_ID; if inreturn and status=:"Closed" then delete; /* what date would be assigned???*/ run;
Really need some rules about what "creation_date" would be changed to. You didn't provide any guideline.
This update might require dropping the values of Creation_date from the Returned table, otherwise matches not dropped would have the creation_date updated to the value in the returned data set. If your returned data has more variables than shown then on the Update statement only the BY variables plus the Status should be kept as dataset options for the returned data.
Thanks for the input, adding 8. in the length statement and a format statement for the date variable changed it to the proper format.
About the "rules" of this date variable, it is automatically created when running the SAS program and its value is the day the program has been ran, through the date() function.
I wasn't aware of the update statement, but by changing the tables used in your example (I wanted to compare table1_return with the newly generated table2) and not dropping the date variable (I wanted to replace the new date by the old date if the query already exists), the outputted example dataset perfectly match the desired result from table2_final.
data example;
update table2 (in=inbase)
table1_returned (in=inreturn)
;
by record_id Event_name Repeat_instance Form Query Editcheck_ID;
if inreturn and status=:"Closed" then delete;
run;
Your solution solved my issue 👌
Glad to help.
Some things to be aware of with UPDATE that you may not catch in reading the documentation the first time.
First is the base data set, the first one on the UPDATE statement can only have ONE observation with the values of the by statement.
Second is the transaction data set, the second on the Update state, may have more than one observation and will apply the all the values from all the observations in sequence.
Any combination of BY variables in the transaction data set not appearing in the base data will be added to the result. From your problem description I couldn't tell if this might be an issue or not. If so you may want to have another IN= variable for the base data set and only write the output data set when the observation is from that set.
To achieve the desired outcome, you can approach this by using a combination of MERGE
and conditional processing in SAS. Here’s how you can proceed:
Step 1: Sort the datasets
To ensure that the MERGE
step aligns the datasets correctly, you need to sort both datasets by the common keys.
Step 2: Merge the datasets
Use the MERGE
statement to join the previous dataset (table1_returned
) with the new dataset (table2
). You can then conditionally update the Date_creation
and Status
based on the logic provided.
Step 3: Filter the results
After the merge, keep only the records that meet your criteria: either update the date or remove answered queries.
Here is a sample SAS code to implement this logic:
/* Step 1: Sort the datasets */
proc sort data=table1_returned;
by record_id Center Event_name Repeat_instance Form Editcheck_ID;
run;
proc sort data=table2;
by record_id Center Event_name Repeat_instance Form Editcheck_ID;
run;
/* Step 2: Merge datasets */
data merged_data;
merge table2(in=new) table1_returned(in=old);
by record_id Center Event_name Repeat_instance Form Editcheck_ID;
/* Step 3: Apply logic */
if new and old then do;
if Status in ('Ongoing') then Date_creation = Date_creation; /* Retain old date if ongoing */
else delete; /* Remove if already closed */
end;
if new and not old then do;
/* New queries remain unchanged */
Status = 'Ongoing';
end;
run;
/* Step 4: Filter final output */
data table2_final;
set merged_data;
if not missing(Status);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.