BookmarkSubscribeRSS Feed
Rsadiq
Calcite | Level 5

Hello

 

I am having an issue with my data steps removing leading 1's from my variable, for example test.prov2 is being sorted by key which I need the dataset sorted by (test.prov2) but in that case the id is losing the leading 1's. How do I not remove the 1's (from variable key)when sorting or merging? Example is attached i want to keep the 1;s in key but they are being dropped.

 

 

proc sort data=test.prov2 ;
by key;
format id $20.;
run;


proc sort data=test1.prov;
by key;
run;

 

data hce.provider_merge5;
merge test.prov2 test1.prov;
length id $20;
by key;
run;

19 REPLIES 19
Kurt_Bremser
Super User

PROC SORT will not change any data in variables, it will only reorder observations and, if NODUPKEY is used, remove complete observations. There will not be any change to the content of variables.
Are you sure that id is only contained in dataset prov2?

Rsadiq
Calcite | Level 5

For example after the sort 

the below data step , i see the attached instead of the 1111s in front of the id, it shows just characters and letters. but i need the ones, why would the ones disappear in the datastep?

 

proc sort data=prov2 ;
by key;
format id $20.;
run;

Kurt_Bremser
Super User

Remove the format statement. Run a PROC PRINT before and after the sort, then post the results here.

Do not attach Office documents; you can post the screenshot directly, using the camera icon.

Rsadiq
Calcite | Level 5

Rsadiq_0-1675525299765.pngshowing before here (before the sort on key) the ids have 111's

 

after the sort this happens, the 1's vanish and only whatever is left after the 1's appears 

 

Rsadiq_1-1675525354519.png

 

 

Kurt_Bremser
Super User

Run this (and ONLY this!):

proc print data=test.prov2 (obs=10);
run;

proc sort
  data=test.prov2 (obs=10)
  out=prov2_test
;
by key;
run;

proc print data=prov2_test;
run;

Post the complete log of the code by copy/pasting it into a window opened with this button:

Bildschirmfoto 2020-04-07 um 08.32.59.jpg

Also post the result of both PRINTs in one screenshot.

 

Rsadiq
Calcite | Level 5
                                                         The SAS System                           09:09 Saturday, February 4, 2023

1          ;*';*";*/;quit;run;
2          OPTIONS PAGENO=MIN;
3          %LET _CLIENTTASKLABEL='Program';
4          %LET _CLIENTPROCESSFLOWNAME='Process Flow';
5          %LET _CLIENTPROJECTPATH='';
6          %LET _CLIENTPROJECTPATHHOST='';
7          %LET _CLIENTPROJECTNAME='';
8          %LET _SASPROGRAMFILE='';
9          %LET _SASPROGRAMFILEHOST='';
10         
11         ODS _ALL_ CLOSE;
12         OPTIONS DEV=ACTIVEX;
13         GOPTIONS XPIXELS=0 YPIXELS=0;
14         FILENAME EGSR TEMP;
15         ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR
16             STYLE=HtmlBlue
17             STYLESHEET=(URL="file:///C:/Program%20Files%20(x86)/SAS94/x86/SASEnterpriseGuide/7.1/Styles/HtmlBlue.css")
18             NOGTITLE
19             NOGFOOTNOTE
20             GPATH=&sasworklocation
21             ENCODING=UTF8
22             options(rolap="on")
23         ;
NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
24         
25         GOPTIONS ACCESSIBLE;
26         proc print data=test.prov2 (obs=10) ;
27         run;

NOTE: There were 10 observations read from the data set test.PROV2.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.00 seconds
      user cpu time       0.00 seconds
      system cpu time     0.01 seconds
      memory              1459.31k
      OS Memory           25776.00k
      Timestamp           02/04/2023 10:04:27 AM
      Step Count                        25  Switch Count  0
      Page Faults                       0
      Page Reclaims                     298
      Page Swaps                        0
      Voluntary Context Switches        5
      Involuntary Context Switches      0
      Block Input Operations            0
      Block Output Operations           0
      

28         
29         proc sort
30           data=test.prov2 (obs=10)
31           out=prov2_test
32         ;
33         by key;
34         run;

NOTE: There were 10 observations read from the data set test.PROV2.
NOTE: The data set WORK.PROV2_TEST has 10 observations and 2 variables.
NOTE: PROCEDURE SORT used (Total process time):
2                                                          The SAS System                           09:09 Saturday, February 4, 2023

      real time           0.00 seconds
      user cpu time       0.00 seconds
      system cpu time     0.00 seconds
      memory              1293.62k
      OS Memory           25776.00k
      Timestamp           02/04/2023 10:04:27 AM
      Step Count                        26  Switch Count  2
      Page Faults                       0
      Page Reclaims                     143
      Page Swaps                        0
      Voluntary Context Switches        35
      Involuntary Context Switches      0
      Block Input Operations            0
      Block Output Operations           0
      

35         
36         proc print data=prov2_test ;
37         run;

NOTE: There were 10 observations read from the data set WORK.PROV2_TEST.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.00 seconds
      user cpu time       0.01 seconds
      system cpu time     0.00 seconds
      memory              340.03k
      OS Memory           25260.00k
      Timestamp           02/04/2023 10:04:27 AM
      Step Count                        27  Switch Count  0
      Page Faults                       0
      Page Reclaims                     32
      Page Swaps                        0
      Voluntary Context Switches        0
      Involuntary Context Switches      0
      Block Input Operations            0
      Block Output Operations           0
      

38         
39         GOPTIONS NOACCESSIBLE;
40         %LET _CLIENTTASKLABEL=;
41         %LET _CLIENTPROCESSFLOWNAME=;
42         %LET _CLIENTPROJECTPATH=;
43         %LET _CLIENTPROJECTPATHHOST=;
44         %LET _CLIENTPROJECTNAME=;
45         %LET _SASPROGRAMFILE=;
46         %LET _SASPROGRAMFILEHOST=;
47         
48         ;*';*";*/;quit;run;
49         ODS _ALL_ CLOSE;
50         
51         
52         QUIT; RUN;
53         

Rsadiq_0-1675526850090.png

115Obs prov_id PROVIDER_KEY
21155
3111555
411115555
51111155555
6111111555555
7111111d5555551
8111111a5555554
911111115555555
1011111135555556

results from both, i tried your code with a higher number of obs for example 200 and it truncates the 1's again but this logic is working for the obs=10

Kurt_Bremser
Super User

My code proves that no truncation occurs, but the output clearly shows you have variable value lengths. Don't be surprised that the sort puts other (shorter) values to the top of the dataset when sorting more of the dataset.

 

Rsadiq
Calcite | Level 5

yes but when i remove the obs statement from the steps , the id gets truncated in the end table. I did do a search to see if the 11's are somewhere else in the table in the ids but there is none. It does look like truncation is going on. I am just confused on why and how. 

Tom
Super User Tom
Super User

I don't understand what you mean by "truncates the 1's".  You clearly have values that have different numbers of the digit 1 in the front of them.  Your two printouts are also showing different alignments of the values.  The first table (the photograph you pasted in of your output) is showing the second column right aligned, like a numeric variable would be.  But the second table (the html table you pasted in) is showing the second column left aligned like SAS would show a character variable.  

 

Remember that ODS output "eats" the leading spaces.  So if the first column has leading spaces then the HTML output like you seem to be showing will not show them.  Regular LISTING output will show the leading spaces in PROC PRINT output.  You can use the $QUOTE format with your character variable to have ODS output preserve the leading spaces since they will be enclosed in quotes. So you can see the difference between "111" and "   111" and why the second one would sort before the first.

 

But if you are actually merging two datasets by KEY and they both contain the variable ID then the value retained will be the one read last.  In a normal one-to-one merge that means the last dataset listed "wins".  But in a one-to-many or many-to-many merge it gets more complicated as datasets that have run out of observations for that value of the KEY variable are no longer being read in, so they do not overwrite the values read from the other dataset.

 

 

Rsadiq
Calcite | Level 5

for example like i am sorting by key, result is as shown 

Rsadiq_0-1675533460748.png

 

but if im sorting by id, the 1's dont vanish why is that the case? Like 11111z is being transformed to just z. After doing a simple sort. This issue is not being solved. why is the id being truncated when i sort by key, when i sort by id it stays (the leading 1's)

Rsadiq_1-1675533534485.png

 

 

 

Tom
Super User Tom
Super User

You seem be comparing apples to oranges.  Your first photograph has one variable. Your second photograph has two variables. You cropped out the variable names when you took the picture of your screen so we cannot even tell if they are of the same variables.  And we do not know the names of the dataset used to make either picture.

 

What is it that you are comparing?  

 

Why not just look for ONE of the values that you thing is gone?

 

So if the variable in the first picture is named ID and the dataset is named HAVE then you can run this code to see the values of ID and KEY for the observations where ID has a value that would probably appear as 111 on your screen.

proc print data=have;
  var id key ;
  where strip(id)='111';
run;

Sort by a variable will definitely change the order that the values appear.

Example:

data have;
  input id :$10. key ;
cards;
1     9
11    8
111   7
1111  6
2     5
234   4
;

proc sort data=have; by id; run;
proc print; run;
proc sort data=have; by key; run;
proc print; run;

Tom_0-1675534929777.png

 

Tom
Super User Tom
Super User

Sorting is not your problem.  If you have a problem it is probably caused by MERGING.

 

Your first posted this code :

proc sort data=test.prov2 ;
by key;
format id $20.;
run;

proc sort data=test1.prov;
by key;
run;

data hce.provider_merge5;
merge test.prov2 test1.prov;
length id $20;
by key;
run;

There are two mistakes in the code.  First is the addition of a FORMAT statement in the first PROC SORT.  That will not impact the sorting, but it might impact what the values look like when you display them.  By attaching the $20. format specification to the ID variable in the TEST.PROV2 dataset you will not see anything in the variable past the first 20 bytes.  So if ID have a length that is longer than 20 some values will be truncated WHEN DISPLAYED.

 

The second is the LENGTH statement in the data step.  Where it is placed it does nothing.  If ID was already on TEST.PROV2 (you should have a WARNING in the SAS Log on the first PROC SORT if it does not exist) then its length in HCE. PROVIDER_MERGE5 will be set by the length it has in TEST.PROV2.  And if it is not in TEST.PROV2 then its length will be set by the length it has in TEST1.PROV.  Only if it is not in either will the LENGTH statement do anything. And what it will do is create an empty character variable named ID.

 

If you are really seeing values of ID that exist in the TEST.PROV2 dataset and do not exist in the HCE.PROVIDER_MERGE5 dataset then that is because the value of ID was overwritten by the value of ID read from the TEST1.PROV dataset.

 

So please tell use information about your two input datasets.  Do both datasets have KEY and ID variables?  What TYPE of variables are they in each dataset? Are they NUMERIC or CHARACTER.  And if they are character what is the LENGTH of the variable?  Do any of the variables have any formats permanently attached to them in either dataset.

 

If ID exists on both datasets then the MERGE will have trouble.

 

First, as has been explained before, since the value of ID from TEST1.PROV will overwrite the values of ID from TEST.PROV2 for matching values of KEY.

 

Second it ID is defined LONGER in TEST1.PROV than in TEST.PROV2 then values read from TEST1.PROV could be truncated. But that would be right truncation, the ends of the values would be removed, not the beginnings.

 

So the most likely cause of values disappearing (or changing) is because you asked SAS to replace the values of ID in TEST.PROV2 with the value of ID from TEST1.PROV instead.

Patrick
Opal | Level 21

As already mentioned Proc Sort doesn't change values.

 

I don't understand how the attached document shows the issue you're describing. 

 

Just having a theory: If you move the length statement before the merge statement does the issue still persist?  ....and do you get truncation warnings in the SAS log?

data hce.provider_merge5;
length id $20;
merge test.prov2 test1.prov;
by key;
run;

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 19 replies
  • 884 views
  • 1 like
  • 5 in conversation