BookmarkSubscribeRSS Feed
cash_moonshine
Fluorite | Level 6
Hi everyone!

I am having some trouble getting a proc format and data pull to work. First, here is the code for the format I am creating:

data main_code;
  length fmtname $32 type $1 start $8 label $20 hlo $1;
  keep fmtname type start label hlo ;
  type='C';
  hlo='M';
  set lib.code_data ;
  fmtname='source';
  start=code;
  label=source_colors;
  output;
run;
proc sort nodupkey ; by fmtname start label ; run;
proc format cntlin=main_code; run;

The above code is meant to create a format based on a .csv I have already imported into SAS called code_data. The "source_colors" column of the .csv contains four distinct labels: Red, Green, Blue, and Yellow. Each label corresponds to a code in the .csv, and there are instances where a code will appear for more than one label.

The following code is the data pull I am trying to run after this format:

data lib.new_data (keep = red: green: blue: yellow: i10_pr1-i10_pr31);
set lib.old_data;
array proc_cd_{*} i10_pr1-i10_pr31;
red=0;
    green=0;
    blue=0;
    yellow=0;
do i=1 to 31;
            if put(proc_cd_{i}, $source.)= "Red" then do;
                            red=1;
                            red_qual_pr=proc_cd_{i};
                            red_qual_pr_pos=(i);
            end;
end;
do i=1 to 31;
            if put(proc_cd_{i}, $source.)= "Green" then do;
                            green=1;
                            green_qual_pr=proc_cd_{i};
                            green_qual_pr_pos=(i);
            end;
end;
do i=1 to 31;
            if put(proc_cd_{i}, $source.)= "Blue" then do;
                            blue=1;
                            blue_qual_pr=proc_cd_{i};
                            blue_qual_pr_pos=(i);
            end;
end;
do i=1 to 31;
            if put(proc_cd_{i}, $source.)= "Yellow" then do;
                            yellow=1;
                            yellow_qual_pr=proc_cd_{i};
                            yellow_qual_pr_pos=(i);
            end;
end;

The primary problem here is that the do loops do not appear to work. "Red" and "Green" should be showing up in the final dataset with a similar amount of codes, but what ends up happening is that if a code shows up for Red and Green, this data pull is skipping Red and only giving Green a value of 1. I need it to give both Red and Green values of 1 any time that label appears in the $source format.

 

One workaround I figured out was to create four separate formats for each of red, green, blue, and yellow, but this is inefficient and I was wondering what I can fix to get just the single format above to work for all four labels?

I know this is probably a mess of an explanation, so please feel free to ask questions and I will try to clarify as best as possible.

 

Thanks for any help!

23 REPLIES 23
andreas_lds
Jade | Level 19

Please post some lines of both datasets as data steps using datalines, so that we have something to work with.

Kurt_Bremser
Super User

Multilabel formats are only good for use in procedures that can deal with them. Using such a format in a PUT function will always result in one value taking precedence.

You may be better off no using a format for your task.

Please post usable examples for code_data and old_data (data steps with datalines, do not skip this!), so we have something for testing and can come up with suggestions.

cash_moonshine
Fluorite | Level 6

Apologies for the missing data! I've tried to recreate what they generally import as here:

data code_data;
   input code $ source_colors $;
   datalines;
0DBC0ZZ Red
0DBE0ZZ Red
0DBC0ZZ Green
0DBE0ZZ Green
0DBC0ZZ Blue
0DBE0ZZ Blue
0DBC0ZZ Yellow
0DBE0ZZ Yellow
;
data old_data;
   input I10_PR1 $ ;
   datalines;
0DBC0ZZ
0DBE0ZZ
0DBC0ZZ
0DBE0ZZ
0DBC0ZZ
0DBE0ZZ
0DBC0ZZ
0DBE0ZZ
;

Hopefully this works. Thanks for the advice so far, let me know if I need to add more to these.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ballardw
Super User

@cash_moonshine wrote:

Apologies for the missing data! I've tried to recreate what they generally import as here:

data code_data;
   input code $ source_colors $;
   datalines;
0DBC0ZZ Red
0DBE0ZZ Red
0DBC0ZZ Green
0DBE0ZZ Green
0DBC0ZZ Blue
0DBE0ZZ Blue
0DBC0ZZ Yellow
0DBE0ZZ Yellow
;
data old_data;
   input I10_PR1 $ ;
   datalines;
0DBC0ZZ
0DBE0ZZ
0DBC0ZZ
0DBE0ZZ
0DBC0ZZ
0DBE0ZZ
0DBC0ZZ
0DBE0ZZ
;

Hopefully this works. Thanks for the advice so far, let me know if I need to add more to these.


Your Code_data has values "Red" "Yellow" and "Blue" and "Green" and those will become the formatted values. Your original data step was testing for a formatted value of "red" "yellow" "blue" and "green". So your "IF" comparisons were using the incorrect comparison. They need to use "Red" "Yellow" etc.

 

It does help to provide what the expected output would be for your data.

ballardw
Super User

Without your data sets, both to create the format and to test the values we really can't say what is happening.

 

A very likely issue may be that you are testing for text "red" but the formatted value is " red" (or similar leading spaces) or "Red" case.

 

One thing to test when creating formats with Cntlin in to create an output set from Proc Format with Cntlout to see what the result could be.

 

Test your data step code with something like to see what the actual formatted values are:

data lib.new_data (keep = red: green: blue: yellow: i10_pr1-i10_pr31);
set lib.old_data;
array proc_cd_{*} i10_pr1-i10_pr31;
red=0;
    green=0;
    blue=0;
    yellow=0;
do i=1 to 31;
            put proc_cd= $20. +1 "Formatted: "  proc_cd_{i}= $source.;
end;
run;

 

One thing to pay attention to is that HLO=M for multilabel formats will only use the multi bit in Proc Summary/Means, Tabulate and Report. This means that only the "top" level formatted value is used for things like "put" in the data step.

cash_moonshine
Fluorite | Level 6

Thanks for your reply!

 

It doesn't look like I have leading spaces or case sensitivity issues here.

 

Sorry, I am new to SAS so I don't quite understand what you mean regarding HLO.

Reeza
Super User

This will create 4 flags, each with two supporting values out of your 31 variables. It will indicate if you have the 4 colours, Red, Green Blue, Yellow. If you have multiple occurrences, you will take only the last values.

Is this what you want to do?

 

 

 

data lib.new_data (keep = red: green: blue: yellow: i10_pr1-i10_pr31);
set lib.old_data;
array proc_cd_{*} i10_pr1-i10_pr31;
red=0;
    green=0;
    blue=0;
    yellow=0;

do i=1 to 31;
            if put(proc_cd_{i}, $source.)= "Red" then do;
                            red=1;
                            red_qual_pr=proc_cd_{i};
                            red_qual_pr_pos=(i);
            end;

            if put(proc_cd_{i}, $source.)= "Green" then do;
                            green=1;
                            green_qual_pr=proc_cd_{i};
                            green_qual_pr_pos=(i);
            end;


            if put(proc_cd_{i}, $source.)= "Blue" then do;
                            blue=1;
                            blue_qual_pr=proc_cd_{i};
                            blue_qual_pr_pos=(i);
            end;


            if put(proc_cd_{i}, $source.)= "Yellow" then do;
                            yellow=1;
                            yellow_qual_pr=proc_cd_{i};
                            yellow_qual_pr_pos=(i);
            end;
end;

run;
Spoiler

@cash_moonshine wrote:
Hi everyone!

I am having some trouble getting a proc format and data pull to work. First, here is the code for the format I am creating:

data main_code;
  length fmtname $32 type $1 start $8 label $20 hlo $1;
  keep fmtname type start label hlo ;
  type='C';
  hlo='M';
  set lib.code_data ;
  fmtname='source';
  start=code;
  label=source_colors;
  output;
run;
proc sort nodupkey ; by fmtname start label ; run;
proc format cntlin=main_code; run;

The above code is meant to create a format based on a .csv I have already imported into SAS called code_data. The "source_colors" column of the .csv contains four distinct labels: Red, Green, Blue, and Yellow. Each label corresponds to a code in the .csv, and there are instances where a code will appear for more than one label.

The following code is the data pull I am trying to run after this format:

data lib.new_data (keep = red: green: blue: yellow: i10_pr1-i10_pr31);
set lib.old_data;
array proc_cd_{*} i10_pr1-i10_pr31;
red=0;
    green=0;
    blue=0;
    yellow=0;
do i=1 to 31;
            if put(proc_cd_{i}, $source.)= "Red" then do;
                            red=1;
                            red_qual_pr=proc_cd_{i};
                            red_qual_pr_pos=(i);
            end;
end;
do i=1 to 31;
            if put(proc_cd_{i}, $source.)= "Green" then do;
                            green=1;
                            green_qual_pr=proc_cd_{i};
                            green_qual_pr_pos=(i);
            end;
end;
do i=1 to 31;
            if put(proc_cd_{i}, $source.)= "Blue" then do;
                            blue=1;
                            blue_qual_pr=proc_cd_{i};
                            blue_qual_pr_pos=(i);
            end;
end;
do i=1 to 31;
            if put(proc_cd_{i}, $source.)= "Yellow" then do;
                            yellow=1;
                            yellow_qual_pr=proc_cd_{i};
                            yellow_qual_pr_pos=(i);
            end;
end;

The primary problem here is that the do loops do not appear to work. "Red" and "Green" should be showing up in the final dataset with a similar amount of codes, but what ends up happening is that if a code shows up for Red and Green, this data pull is skipping Red and only giving Green a value of 1. I need it to give both Red and Green values of 1 any time that label appears in the $source format.

 

One workaround I figured out was to create four separate formats for each of red, green, blue, and yellow, but this is inefficient and I was wondering what I can fix to get just the single format above to work for all four labels?

I know this is probably a mess of an explanation, so please feel free to ask questions and I will try to clarify as best as possible.

 

Thanks for any help!


cash_moonshine
Fluorite | Level 6

Not quite. Let me see if I can explain better. I am new to SAS so this has proven confusing in more ways than one!

 

Basically, my hope was for the data step to iterate through old_data 4 times, once each for Red, Green, Blue, and Yellow. Each time a code appeared with an associated label, I wanted it to give a value of 1 to the new binary variables I created. I don't want these to be mutually exclusive, I want the code to do the same thing for each color. But I was wondering how to do this without the need for four separate formats for each color, just the one format that I provided.

Reeza
Super User

I think your logic and approach is unfortunately in the wrong direction. Please show what you're expecting as output given your sample input. You have 31 one variables how do you want them consolidated into the 4 flags? How do you want the multiples to be handled, eg. in 31 values you have 4 blue, 8 red, 10 yellow and 9 green? Which record do you want to keep. I'm suspecting you may just need a transpose and a format.

 

Please expand your example data (old_data) to have at least two or three variables so it's more reflective of your actual data.

cash_moonshine
Fluorite | Level 6

I would like my output to look something like this:

I10_PR1             Red                Green              Blue                  Yellow

0DBC0ZZ              1                     1                      1                        1

0DBE0ZZ              1                     1                      1                        1
etc.

Instead, what I am getting is this:

I10_PR1             Red                Green              Blue                  Yellow

0DBC0ZZ              0                    1                      1                        1

0DBE0ZZ              0                     1                      1                        1

 

Even though the code is present for Red, it is not being assigned a value of 1 as requested in the do loop.

Kurt_Bremser
Super User

See this example:

data code_data;
   input code $ source_colors $;
   datalines;
0DBC0ZZ Red
0DBE0ZZ Red
0DBC0ZZ Green
0DBE0ZZ Green
0DBC0ZZ Blue
0DBE0ZZ Blue
0DBC0ZZ Yellow
0DBE0ZZ Yellow
;

data old_data;
   input I10_PR1 $ ;
   datalines;
0DBC0ZZ
0DBE0ZZ
0DBC0ZZ
0DBE0ZZ
0DBC0ZZ
0DBE0ZZ
0DBC0ZZ
0DBE0ZZ
;

proc sql;
create table long as
  select distinct
    a.I10_PR1,
    b.source_colors,
    1 as value
  from old_data a left join code_data b
  on a.I10_PR1 = b.code
  order by a.I10_PR1
;
quit;

proc transpose data=long out=wide;
by I10_PR1;
id source_colors;
var value;
run;

Now, your wider dataset:

data old_data;
   input I10_PR1 $ I10_PR2 $ I10_PR3 $;
   datalines;
0DBC0ZZ 0BQS0ZZ 0BQR4ZZ
0DBE0ZZ 07BP4ZZ 0BQT0ZZ
0DBC0ZZ 0BQS0ZZ 0BQR4ZZ
0DBE0ZZ 07BP4ZZ 0BQT0ZZ
0DBC0ZZ 0BQS0ZZ 0BQR4ZZ
0DBE0ZZ 07BP4ZZ 0BQT0ZZ
0DBC0ZZ 0BQS0ZZ 0BQR4ZZ
0DBE0ZZ 07BP4ZZ 0BQT0ZZ
;

would first need to be transposed to long, so you have a single column with codes.

Reeza
Super User

You indicated you have 31 variables. How would it show for more than two variables, how will you scale it? If this is the given input, please show the EXACT output. For a single PR code, how are you getting multiple flags?

 

data code_data;
   input code $ source_colors $;
   datalines;
0DBC0ZZ Red
0DBE0ZZ Red
0DBC0ZZ Green
0DBE0ZZ Green
0DBC0ZZ Blue
0DBE0ZZ Blue
0DBC0ZZ Yellow
0DBE0ZZ Yellow
;
data old_data;
   input I10_PR1 $  I10_PR2 $;
   datalines;
0DBC0ZZ 0DBC0ZZ
0DBE0ZZ 0DBC0ZZ
0DBE0ZZ 0DBC0ZZ
0DBE0ZZ 0DBC0ZZ
0DBE0ZZ 0DBE0ZZ
0DBC0ZZ 0DBE0ZZ
0DBC0ZZ 0DBE0ZZ
0DBC0ZZ 0DBE0ZZ
;

@cash_moonshine wrote:

I would like my output to look something like this:

I10_PR1             Red                Green              Blue                  Yellow

0DBC0ZZ              1                     1                      1                        1

0DBE0ZZ              1                     1                      1                        1
etc.

Instead, what I am getting is this:

I10_PR1             Red                Green              Blue                  Yellow

0DBC0ZZ              0                    1                      1                        1

0DBE0ZZ              0                     1                      1                        1

 

Even though the code is present for Red, it is not being assigned a value of 1 as requested in the do loop.


 

cash_moonshine
Fluorite | Level 6

...

ballardw
Super User

When code seems not to behave you need to show the LOG for the data step or procedure. All of it, the code with all of the notes, warnings or messages. Copy the text from the Log and paste into a text box opened on the forum with the </> above the message window.

The text box can be extremely critical depending on the text in the log. The main message windows will reformat text which will make some of the diagnostics SAS provides out of order.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 23 replies
  • 1282 views
  • 6 likes
  • 6 in conversation