BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kmaths
Calcite | Level 5

I have two datasets, imported from Excel. Dataset 1 contains observations for several thousand people, each identified by a 3-digit alphanumeric code. Several other variables are linked to each of these codes (i.e. gender, age, date, etc). Sometimes the same code is (correctly) used more than once for different people, so there may be say 5 lines each identified by the same code but with different ages, genders, and so on. Dataset 2 contains only a list of each possible 3-digit code that could be assigned (of which there are 526), with no further information. I would like to compare the two datasets, and identify which of the 526 codes found in Dataset 2 are not being used in Dataset 1. I then wish to add these unused codes into Dataset 1, and list them as missing or blank on the other variables (i.e. gender, age, etc) so that it is clear that there are no observations associated with these particular codes at the present time.

I am unsure how to approach this - initially I thought it might be considered a "full outer join" using PROC SQL, but that didn't seem to work. I also considered that I might be able to use PROC COMPARE to identify the values unique to Dataset 2, output them, and then use PROC APPEND to add them into Dataset 1, but I am not sure if this would work, as PROC COMPARE seems to compare only pairs of values. Any advice on how to approach this would be sincerely appreciated! Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

You could do as hinted: add UNDO_POLICY=NONE. Also add option NOWARNRECURS because recursion should not be a problem here :

proc sql undo_policy=none nowarnrecurs;

insert into dataset1 (code)

select code

from dataset2

where code not in (select code from dataset1);

quit;

You'll get a warning about not being able to undo the operation but the insertion will be performed as requested.

PG

PG

View solution in original post

8 REPLIES 8
SASKiwi
PROC Star

I suggest you look at PROC FREQ. This will give you a count of distinct values in a SAS dataset, as well as a report, which can then be joined back to your base dataset so you require.

PGStats
Opal | Level 21

How about:

proc sql;

insert into dataset1 (code)

select code

from dataset2

where dataset2.code not in (select code from dataset1);

quit;

PG

PG
kmaths
Calcite | Level 5

Thank you PGStats, that sounds promising! Currently using this code I am receiving an error, so I think I must be inputting my dataset names incorrectly. In the last line before the quit statement, if I use the full dataset name (i.e. "libraryname.dataset2.code"), I get a syntax error (Error 22-322: "Syntax error, expecting one of the following..."). If I leave out the library name ("i.e. "dataset2.code"), I receive the following error:

"WARNING: This DELETE/INSERT statement recursively references the target table. A consequence of this is a possible data integrity problem.

ERROR: You cannot reopen LIBRARYNAME.DATASET1 for update access with member-level control because LIBRARYNAME.DATASET1 is in use by you in resource environment SQL. ERROR: PROC SQL could not undo this statement if an ERROR were to happen as it could not obtain exclusive access to the data set. This statement will not execute as the SQL option UNDO_POLICY=REQUIRED is in effect."

Reeza
Super User

SAS does not use 3 level names for tables.

A table is referred to as libname.memname

and variables are referred to as either just the variable name or memname.variable.

PGStats
Opal | Level 21

You could do as hinted: add UNDO_POLICY=NONE. Also add option NOWARNRECURS because recursion should not be a problem here :

proc sql undo_policy=none nowarnrecurs;

insert into dataset1 (code)

select code

from dataset2

where code not in (select code from dataset1);

quit;

You'll get a warning about not being able to undo the operation but the insertion will be performed as requested.

PG

PG
Marina
Calcite | Level 5

You can also use merge, and accomplish what you would like in one step. :

data dataset_merge;

     merge     dataset2 (in=a)

                    dataset1 (in=b);

by code;

if a;

if a and not b then code_not_yet_used = 1;

run;

if you would like to have just a list of unused codes:

data dataset_unused_codes (keep=code);

     merge     dataset2 (in=a)

                    dataset1 (in=b);

by code;

if a and not b;

run;

kmaths
Calcite | Level 5

Thank you very much for your help everyone! The code you provided works well, PGStats, and thank you for the syntax for obtaining a list of the unused codes, Marina - that is an excellent way to keep tabs on them and see if it's working as expected. I sincerely appreciate the assistance!

FriedEgg
SAS Employee

data have;

modify have new;

by name;

if _iorc_ = %sysrc(_DSENMR) then do;

output;

_error_=0;

end;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1939 views
  • 3 likes
  • 6 in conversation