Hello
I am having an issue with my data steps removing leading 1's from my variable, for example test.prov2 is being sorted by key which I need the dataset sorted by (test.prov2) but in that case the id is losing the leading 1's. How do I not remove the 1's (from variable key)when sorting or merging? Example is attached i want to keep the 1;s in key but they are being dropped.
proc sort data=test.prov2 ;
by key;
format id $20.;
run;
proc sort data=test1.prov;
by key;
run;
data hce.provider_merge5;
merge test.prov2 test1.prov;
length id $20;
by key;
run;
PROC SORT will not change any data in variables, it will only reorder observations and, if NODUPKEY is used, remove complete observations. There will not be any change to the content of variables.
Are you sure that id is only contained in dataset prov2?
For example after the sort
the below data step , i see the attached instead of the 1111s in front of the id, it shows just characters and letters. but i need the ones, why would the ones disappear in the datastep?
proc sort data=prov2 ;
by key;
format id $20.;
run;
Remove the format statement. Run a PROC PRINT before and after the sort, then post the results here.
Do not attach Office documents; you can post the screenshot directly, using the camera icon.
showing before here (before the sort on key) the ids have 111's
after the sort this happens, the 1's vanish and only whatever is left after the 1's appears
Run this (and ONLY this!):
proc print data=test.prov2 (obs=10);
run;
proc sort
data=test.prov2 (obs=10)
out=prov2_test
;
by key;
run;
proc print data=prov2_test;
run;
Post the complete log of the code by copy/pasting it into a window opened with this button:
Also post the result of both PRINTs in one screenshot.
The SAS System 09:09 Saturday, February 4, 2023 1 ;*';*";*/;quit;run; 2 OPTIONS PAGENO=MIN; 3 %LET _CLIENTTASKLABEL='Program'; 4 %LET _CLIENTPROCESSFLOWNAME='Process Flow'; 5 %LET _CLIENTPROJECTPATH=''; 6 %LET _CLIENTPROJECTPATHHOST=''; 7 %LET _CLIENTPROJECTNAME=''; 8 %LET _SASPROGRAMFILE=''; 9 %LET _SASPROGRAMFILEHOST=''; 10 11 ODS _ALL_ CLOSE; 12 OPTIONS DEV=ACTIVEX; 13 GOPTIONS XPIXELS=0 YPIXELS=0; 14 FILENAME EGSR TEMP; 15 ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR 16 STYLE=HtmlBlue 17 STYLESHEET=(URL="file:///C:/Program%20Files%20(x86)/SAS94/x86/SASEnterpriseGuide/7.1/Styles/HtmlBlue.css") 18 NOGTITLE 19 NOGFOOTNOTE 20 GPATH=&sasworklocation 21 ENCODING=UTF8 22 options(rolap="on") 23 ; NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR 24 25 GOPTIONS ACCESSIBLE; 26 proc print data=test.prov2 (obs=10) ; 27 run; NOTE: There were 10 observations read from the data set test.PROV2. NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.01 seconds memory 1459.31k OS Memory 25776.00k Timestamp 02/04/2023 10:04:27 AM Step Count 25 Switch Count 0 Page Faults 0 Page Reclaims 298 Page Swaps 0 Voluntary Context Switches 5 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 0 28 29 proc sort 30 data=test.prov2 (obs=10) 31 out=prov2_test 32 ; 33 by key; 34 run; NOTE: There were 10 observations read from the data set test.PROV2. NOTE: The data set WORK.PROV2_TEST has 10 observations and 2 variables. NOTE: PROCEDURE SORT used (Total process time): 2 The SAS System 09:09 Saturday, February 4, 2023 real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 1293.62k OS Memory 25776.00k Timestamp 02/04/2023 10:04:27 AM Step Count 26 Switch Count 2 Page Faults 0 Page Reclaims 143 Page Swaps 0 Voluntary Context Switches 35 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 0 35 36 proc print data=prov2_test ; 37 run; NOTE: There were 10 observations read from the data set WORK.PROV2_TEST. NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds user cpu time 0.01 seconds system cpu time 0.00 seconds memory 340.03k OS Memory 25260.00k Timestamp 02/04/2023 10:04:27 AM Step Count 27 Switch Count 0 Page Faults 0 Page Reclaims 32 Page Swaps 0 Voluntary Context Switches 0 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 0 38 39 GOPTIONS NOACCESSIBLE; 40 %LET _CLIENTTASKLABEL=; 41 %LET _CLIENTPROCESSFLOWNAME=; 42 %LET _CLIENTPROJECTPATH=; 43 %LET _CLIENTPROJECTPATHHOST=; 44 %LET _CLIENTPROJECTNAME=; 45 %LET _SASPROGRAMFILE=; 46 %LET _SASPROGRAMFILEHOST=; 47 48 ;*';*";*/;quit;run; 49 ODS _ALL_ CLOSE; 50 51 52 QUIT; RUN; 53
2 | 11 | 55 |
3 | 111 | 555 |
4 | 1111 | 5555 |
5 | 11111 | 55555 |
6 | 111111 | 555555 |
7 | 111111d | 5555551 |
8 | 111111a | 5555554 |
9 | 1111111 | 5555555 |
10 | 1111113 | 5555556 |
results from both, i tried your code with a higher number of obs for example 200 and it truncates the 1's again but this logic is working for the obs=10
My code proves that no truncation occurs, but the output clearly shows you have variable value lengths. Don't be surprised that the sort puts other (shorter) values to the top of the dataset when sorting more of the dataset.
yes but when i remove the obs statement from the steps , the id gets truncated in the end table. I did do a search to see if the 11's are somewhere else in the table in the ids but there is none. It does look like truncation is going on. I am just confused on why and how.
Post your .sas7bdat file as an attachment. It is in the data, so we must see it.
I don't understand what you mean by "truncates the 1's". You clearly have values that have different numbers of the digit 1 in the front of them. Your two printouts are also showing different alignments of the values. The first table (the photograph you pasted in of your output) is showing the second column right aligned, like a numeric variable would be. But the second table (the html table you pasted in) is showing the second column left aligned like SAS would show a character variable.
Remember that ODS output "eats" the leading spaces. So if the first column has leading spaces then the HTML output like you seem to be showing will not show them. Regular LISTING output will show the leading spaces in PROC PRINT output. You can use the $QUOTE format with your character variable to have ODS output preserve the leading spaces since they will be enclosed in quotes. So you can see the difference between "111" and " 111" and why the second one would sort before the first.
But if you are actually merging two datasets by KEY and they both contain the variable ID then the value retained will be the one read last. In a normal one-to-one merge that means the last dataset listed "wins". But in a one-to-many or many-to-many merge it gets more complicated as datasets that have run out of observations for that value of the KEY variable are no longer being read in, so they do not overwrite the values read from the other dataset.
for example like i am sorting by key, result is as shown
but if im sorting by id, the 1's dont vanish why is that the case? Like 11111z is being transformed to just z. After doing a simple sort. This issue is not being solved. why is the id being truncated when i sort by key, when i sort by id it stays (the leading 1's)
You seem be comparing apples to oranges. Your first photograph has one variable. Your second photograph has two variables. You cropped out the variable names when you took the picture of your screen so we cannot even tell if they are of the same variables. And we do not know the names of the dataset used to make either picture.
What is it that you are comparing?
Why not just look for ONE of the values that you thing is gone?
So if the variable in the first picture is named ID and the dataset is named HAVE then you can run this code to see the values of ID and KEY for the observations where ID has a value that would probably appear as 111 on your screen.
proc print data=have;
var id key ;
where strip(id)='111';
run;
Sort by a variable will definitely change the order that the values appear.
Example:
data have;
input id :$10. key ;
cards;
1 9
11 8
111 7
1111 6
2 5
234 4
;
proc sort data=have; by id; run;
proc print; run;
proc sort data=have; by key; run;
proc print; run;
Sorting is not your problem. If you have a problem it is probably caused by MERGING.
Your first posted this code :
proc sort data=test.prov2 ;
by key;
format id $20.;
run;
proc sort data=test1.prov;
by key;
run;
data hce.provider_merge5;
merge test.prov2 test1.prov;
length id $20;
by key;
run;
There are two mistakes in the code. First is the addition of a FORMAT statement in the first PROC SORT. That will not impact the sorting, but it might impact what the values look like when you display them. By attaching the $20. format specification to the ID variable in the TEST.PROV2 dataset you will not see anything in the variable past the first 20 bytes. So if ID have a length that is longer than 20 some values will be truncated WHEN DISPLAYED.
The second is the LENGTH statement in the data step. Where it is placed it does nothing. If ID was already on TEST.PROV2 (you should have a WARNING in the SAS Log on the first PROC SORT if it does not exist) then its length in HCE. PROVIDER_MERGE5 will be set by the length it has in TEST.PROV2. And if it is not in TEST.PROV2 then its length will be set by the length it has in TEST1.PROV. Only if it is not in either will the LENGTH statement do anything. And what it will do is create an empty character variable named ID.
If you are really seeing values of ID that exist in the TEST.PROV2 dataset and do not exist in the HCE.PROVIDER_MERGE5 dataset then that is because the value of ID was overwritten by the value of ID read from the TEST1.PROV dataset.
So please tell use information about your two input datasets. Do both datasets have KEY and ID variables? What TYPE of variables are they in each dataset? Are they NUMERIC or CHARACTER. And if they are character what is the LENGTH of the variable? Do any of the variables have any formats permanently attached to them in either dataset.
If ID exists on both datasets then the MERGE will have trouble.
First, as has been explained before, since the value of ID from TEST1.PROV will overwrite the values of ID from TEST.PROV2 for matching values of KEY.
Second it ID is defined LONGER in TEST1.PROV than in TEST.PROV2 then values read from TEST1.PROV could be truncated. But that would be right truncation, the ends of the values would be removed, not the beginnings.
So the most likely cause of values disappearing (or changing) is because you asked SAS to replace the values of ID in TEST.PROV2 with the value of ID from TEST1.PROV instead.
As already mentioned Proc Sort doesn't change values.
I don't understand how the attached document shows the issue you're describing.
Just having a theory: If you move the length statement before the merge statement does the issue still persist? ....and do you get truncation warnings in the SAS log?
data hce.provider_merge5;
length id $20;
merge test.prov2 test1.prov;
by key;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.