BookmarkSubscribeRSS Feed
Dhana18
Obsidian | Level 7
data uniqueid_count_Provider;
set  uniqueid_count_exclu_other;
if Provider_Name1="PLANNED PARENTHOOD-EXPRESS CAPITOL DRIVE" OR Provider_Name1="PLANNED PARENTHOOD-WI AVENUE" OR Provider_Name1="PLANNED PARENTHOOD-NW" 
OR Provider_Name1="BRADY EAST STD CLINIC (BESTD)" OR Provider_Name1="HOLTON ST/*STD SPECIALTIES CLINIC" then Clinic="NSTD";
if Provider_Name1="MILWAUKEE HEALTH DEPARTMENT - KEENAN HEALTH CENTER" OR Provider_Name1="MILWAUKEE HEALTH DEPARTMENT - STD CLINIC" 
OR Provider_Name1="KHC-STD" OR Provider_Name1="KHC-FDTT" then Clinic="STD";
run;

What is wrong with this code?

 

The log says this:


158 data uniqueid_count_Provider;
159 set uniqueid_count_exclu_other;
160 if Provider_Name1="PLANNED PARENTHOOD-EXPRESS CAPITOL DRIVE" OR Provider_Name1="PLANNED
160! PARENTHOOD-WI AVENUE" OR Provider_Name1="PLANNED PARENTHOOD-NW"
161 OR Provider_Name1="BRADY EAST STD CLINIC (BESTD)" OR Provider_Name1="HOLTON ST/*STD
161! SPECIALTIES CLINIC" then Clinic="NSTD";
162 if Provider_Name1="MILWAUKEE HEALTH DEPARTMENT - KEENAN HEALTH CENTER" OR
162! Provider_Name1="MILWAUKEE HEALTH DEPARTMENT - STD CLINIC"
163 OR Provider_Name1="KHC-STD" OR Provider_Name1="KHC-FDTT" then Clinic="STD";
164 run;

NOTE: There were 1416 observations read from the data set WORK.UNIQUEID_COUNT_EXCLU_OTHER.
NOTE: The data set WORK.UNIQUEID_COUNT_PROVIDER has 1416 observations and 21 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds

 

The result is this:(I am not getting the complete records with STD and NSTD categories.) 

Clinic Frequency CumulativeFrequencySTDFrequency Missing = 28
4040

 

7 REPLIES 7
Tom
Super User Tom
Super User

Looks like none of the conditions in your IF statements where true.

Dhana18
Obsidian | Level 7
Did not understand your comment sorry. Can you please explain a little more?
Reeza
Super User
None of PROVIDER1 values match the values you've typed out.
Run a PROC FREQ on PROVIDER1 variable and look at the counts and see if you have the values you've listed. Note that it must be an exact match for character variables.
Dhana18
Obsidian | Level 7
I actually did proc freq and copied the value into the code.
Tom
Super User Tom
Super User

@Dhana18 wrote:
I actually did proc freq and copied the value into the code.

Double check you ran PROC FREQ on the same dataset. Also make sure the variable does not have a format attached to it that is causing the values you see in the PROC FREQ to be different than the actual values.

You could try using the VVALUE() function in your IF statement so you can test based on the formatted values.

If vvalue(Provider_Name1) in 
("PLANNED PARENTHOOD-EXPRESS CAPITOL DRIVE" 
,"PLANNED PARENTHOOD-WI AVENUE"
,"PLANNED PARENTHOOD-NW"
,"BRADY EAST STD CLINIC (BESTD)"
,"HOLTON ST/*STD SPECIALTIES CLINIC"
) then Clinic="NSTD";
Kurt_Bremser
Super User

What you see in the output of proc freq are formatted values, but in the data step you need to use the raw values, or a put() function with the assigned format.

ballardw
Super User

The first I would do is examine the values that did not get assigned with something like:

data work.junk;
   set uniqueid_count_Provider;
   where missing(clinic);
   keep provider_name1;
run;

Then very carefully examine those results character by character for something that doesn't match.

 

 

Since you are using equal comparisons to assign values then any minor change: missing or inserted space, capitalization difference, or extra/missing character will result in missing values.

 

 

BTW I address such items to an extent on my reading data programs with custom informats that tell me when something is different when the external date file is read. And do such assignments with custom formats:

 Example:

proc format library=work;
invalue $provider (upcase)
"PLANNED PARENTHOOD-EXPRESS CAPITOL DRIVE" = "PLANNED PARENTHOOD-EXPRESS CAPITOL DRIVE"
"PLANNED PARENTHOOD-WI AVENUE" = "PLANNED PARENTHOOD-WI AVENUE"
"BRADY EAST STD CLINIC (BESTD)" = "BRADY EAST STD CLINIC (BESTD)" 
"KHC-STD" ="KHC-STD"
other= _error_;
value $clinic
"PLANNED PARENTHOOD-EXPRESS CAPITOL DRIVE" = "NSTD"
"PLANNED PARENTHOOD-WI AVENUE" = "NSTD"
"BRADY EAST STD CLINIC (BESTD)" = "NSTD" 
"KHC-STD" ="STD"
other="Unknown"
;

data example;
   infile datalines dlm=',';
   informat provider_name1 $provider. ;
   input  provider_name1 somenumber;
datalines;
PLANNED PARENTHOOD-EXPRESS CAPITOL DRIVE,1
planned parenthood-express capitol drive,2
PLANNED PARENTHOOD-EXPRESS CAPITOL DRIV,3
PLANNED PARENTHOOD-WI AVENUE,4
pLANNED PARENTHOOD-WI AVENUE,5
BRADY EAST STD CLINIC (BESTD),6
BRADY EAST STD CLINIC,7
Some completely new clinic,8
KHC-STD,9
;
run;

proc freq data=example;
   tables provider_name1;
   format provider_name1 $clinic.;
run;


Some things to note:

The INVALUE for Provider will correct capitalization issues. It first converts ANY string to upper case before doing the comparison with the listed values. The target string (on the right of the 😃 could be in proper case to make future reading easier, or make a custom display format with the proper case for report writing. The keyword 'other' indicates what to do with any value that does not exactly match the source strings listed. The value _error_ says to treat anything that doesn't match as an error and will write the appropriate invalid data message that any SAS invalid data generates. So when a data source has data entry issues, or as frequently happens with my data, adds a new provider without telling me, I see the value and can go back to the data source to confirm and collect other information I may need about that provider such as address, billing information, contacts or what have your.

 

The custom format $Clinic has the logic to do the assignment you started with using your existing variable. There would be very little need for an additional variable for many purposes.

 

This has the advantage of when a new clinic is added to the data then I add the value to the format and I do not need to change any code for reports. Also, when you have a variable name like Provider_name1 it makes me suspect there may be a Provider_name2, or 3, or 4. And if they should all becoming from the same list they can share a single informat for data checking when reading.

 

For one of my projects I maintain formats for assigning "clinics" to 1)reporting geographical regions 2) reporting types (your STD / NSTD) 3) controlling agencies and have multiple display formats for "short" or "long" provider names. And there are some nice things that can be done with multilabel formats once you get the hang of them. I also have custom informats to check that the values entered for things like insurance, test type, specimen location, test result because things happen when systems are updated or a new reporting entity gets involved and they may take awhile to get things "right".

 

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 703 views
  • 0 likes
  • 5 in conversation