Dear community,
I have a dataset (imported from xlsx each time when run the program) that has predefined LENGTH for variables to fit all their values.
eg:
SUBJID RELREC
A100101 Yes
A100102 No
A100103 Unable to adjudicate due to insufficient information
When imported into SAS SUBJID has a length of $27 while RELREC has $57.
Per my client's data standards (specifications), export dataset must have following LENGTH and FORMAT for above variables:
VARIABLE LENGTH FORMAT Expected VALUE
RELREC $2 $NYU. "0" ("No")
"1" ("Yes")
"2" ("Unable to Adjudicate due to....")
My challenge is when applying format and length in DATA step RELREC value is getting truncated and not properly formatted due to data truncation. eg:
RELREC (formatted)
0
Ye
Un
To my understanding LENGTH cannot be shorter than the variable's actual value. I tried to assign permanent format at the first data step, then assign length (to a new variable for renaming it later as RELREC) in a different data step. But still getting the same outcome.
I hope someone from this community had encountered the same issue/or has clear understanding on the behavior of LENGTH vs FORMAT statements and could give some guidance in resolving this issue.
Many thanks in advance!
I see this gives required attributes as format and length, not "the formatted value" which is "0", "1" and "2".
You want a length of $2. Therefore the unformatted value can only be 0, 1, 2 and the formatted value is the text.
Sounds like you need to convert the text in the XLSX file into the values that the format expects.
Note that you cannot change the LENGTH of a character variable once it is defined. You probably need to create a NEW variable that will have the shorter length. You can always rename it to have the desired name.
data want;
set have;
length new_relrec $2;
if relrec='No' then new_relrec='0';
else if relrec='Yes' then new_relrec='1';
else if relrec=:'Unable to' then new_relrec='2';
drop relrec;
rename new_relrec = relrec;
format new_relres $nyu.;
run;
In SAS formats are used to convert values to text and informats are used to convert text to values. Since your values are text you could create either an INFORMAT or a format. So you might create an informat named $NYUI or a format named $NYUR and then replace the IF/THEN/ELSE block with simple assignment statement. Like:
new_relec=input(relrec,$nyuI.);
or
new_relrec=put(relrec,$nyuR.);
Note that in this case since the first two characters of the decoded values you are starting with is enough to uniquely identify the codes you want to map to could skip the renaming part.
data want;
length relrec $2;
set have;
if relrec='No' then relrec='0';
else if relrec='Ye' then relrec='1';
else if relrec='Un' then relrec='2';
format relres $nyu.;
run;
Hi @Tom ,
Thank you for your input!
What I did not understand is I've created format for "YES", "NO" and "UNABLE TO.." values.
like:
run;
In your example below the format is applied to "0", "1" and "2" values of new_relrec variable (not to 'yes', 'no' 'unable to..')..... It does give me desired METADATA (when checked in proc contents), however I did not understand the logic here.
data want;
set have;
length new_relrec $2;
if relrec='No' then new_relrec='0';
else if relrec='Yes' then new_relrec='1';
else if relrec=:'Unable to' then new_relrec='2';
drop relrec;
rename new_relrec = relrec;
format new_relres $nyu.;
run;
In a VALUE Statement in the FORMAT Procedure, the source value has to be on the left of the equal sign, and the target value to the right.
Your format, as defined, could be used for the initial conversion, if the source text was in all uppercase.
You probably want to run something like this after your proc import:
proc format ;
value $convert "No" ="0"
"Yes" ="1"
"U"-"V"="2";
value $nyu (default=57) "0"= "No"
"1"= "Yes"
"2"= "Unable to Adjudicate due to...." ;
run;
data WANT(drop=R);
set HAVE(rename=(RELREC=R));
attrib RELREC length=$2 format=$nyu.;
RELREC=put(R,$convert.);
run;
Hi @ChrisNZ ,
In this example '0', '1', '2' are treated as source values. And they get formatted into "Yes", "No" and "Unable to..".
My problem is the opposite of this. "Yes", "No" and "Unable to.." are source values.. I need to format them into '0', '1' and '2' and fit the string in $2 length.
@Tom 's example works, in one hand, to create a new variable with the length of $2 and conditionally convert RELREC values into "0", "1" and "2" of new variable, and rename it as RELREC. Then apply $NYU. format... But the formatted value is not the original value here. When format removed you cannot see "Yes", "No" and "Unable to.." values...
> In this example '0', '1', '2' are treated as source values. And they get formatted into "Yes", "No" and "Unable to..".
Not at all. The full string is transformed into a new variable containing the digit string, and a format is associated with the new string.
data HAVE;
input SUBJID $8. RELREC $80.;
cards;
A100101 Yes
A100102 No
A100103 Unable to adjudicate due to insufficient information
run;
proc format ;
value $convert "No" ="0"
"Yes" ="1"
"U"-"V"="2";
value $nyu (default=57) "0"= "No"
"1"= "Yes"
"2"= "Unable to Adjudicate due to...." ;
run;
data WANT(drop=R);
set HAVE(rename=(RELREC=R));
attrib RELREC length=$2 format=$nyu.;
RELREC=put(R,$convert.);
run;
proc print; proc contents; run;
| Obs | SUBJID | RELREC |
|---|---|---|
| 1 | A100101 | Yes |
| 2 | A100102 | No |
| 3 | A100103 | Unable to Adjudicate due to.... |
| Data Set Name | WORK.WANT | Observations | 3 |
|---|---|---|---|
| Member Type | DATA | Variables | 2 |
| Engine | V9 | Indexes | 0 |
| Created | 05/08/2021 09:48:32 | Observation Length | 10 |
| Last Modified | 05/08/2021 09:48:32 | Deleted Observations | 0 |
| Protection | Compressed | NO | |
| Data Set Type | Sorted | NO | |
| Label | |||
| Data Representation | WINDOWS_64 | ||
| Encoding | wlatin1 Western (Windows) |
| Engine/Host Dependent Information | |
|---|---|
| Data Set Page Size | 131072 |
| Number of Data Set Pages | 1 |
| First Data Page | 1 |
| Max Obs per Page | 12943 |
| Obs in First Data Page | 3 |
| Number of Data Set Repairs | 0 |
| Filename | K:\SASWORK\_TD17820_NZ8037SPSAS2003_\Prc2\want.sas7bdat |
| Release Created | 9.0401M2 |
| Host Created | X64_SRV12 |
| Alphabetic List of Variables and Attributes | ||||
|---|---|---|---|---|
| # | Variable | Type | Len | Format |
| 2 | RELREC | Char | 2 | $NYU. |
| 1 | SUBJID | Char | 8 | |
I see this gives required attributes as format and length, not "the formatted value" which is "0", "1" and "2". In your print results we still see original values "Yes", "No" and "Unable..."..
I just modified formatting part of your code and see I'm getting both required metadata and value:
proc format;
value $nyu (default=57) "No"= "0"
"Yes"= "1"
"Unable to ........."= "2";
run;
data WANT(drop=R);
set HAVE(rename=(RELREC=R));
attrib RELREC length=$2 format=$nyu. ;
RELREC=put(R,$nyu.);
run;
proc print; proc contents; run;
Couldn't paste contents results here...Please check it and give your feedback. Appreciate your support!
I see this gives required attributes as format and length, not "the formatted value" which is "0", "1" and "2".
You want a length of $2. Therefore the unformatted value can only be 0, 1, 2 and the formatted value is the text.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.