BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Reeza
Super User

Leading zero's can get dropped, dates can be read incorrectly  (mmddyy instead of ddmmyy), numbers longer than 16 digits will have the end digits changed if not imported as character

 


@geneshackman wrote:

"If you open a CSV in Excel and then save it to XLSX it will change values."

Really? I didn't know that. What values will change? I'm sure you don't mean it will change a 1 to a 2. Perhaps this is about blanks, missings, like that?


 

geneshackman
Pyrite | Level 9

"Leading zero's can get dropped, dates can be read incorrectly  (mmddyy instead of ddmmyy), numbers longer than 16 digits will have the end digits changed if not imported as character."

Thanks, good to know!

ballardw
Super User

@geneshackman wrote:

"If you open a CSV in Excel and then save it to XLSX it will change values."

Really? I didn't know that. What values will change? I'm sure you don't mean it will change a 1 to a 2. Perhaps this is about blanks, missings, like that?


Changes can be much more insidious than a digit. Certain values that may start as 5-10, such as possibly a part number, can become a date value for May 10 in the current year. Which is then saved internally in the sheet as the number of days since 1 Jan 1900.

If you have values like account or banking numbers that have leading 0 that are critical Excel will try to make them numbers.

So account numbers of:

0005

000005

which would be important in your accounting software would both get reduced to 5. If there are enough digits it may attempt to convert to numeric but run into storage precision limits and then round the values.

Other values with things could be interpreted as apply an operator like subtraction or addition, convert to a percentage or who knows what.

I seem to remember seeing some cases of values in () getting treated as negative values.

 

And these behaviors will be on a cell-by-cell basis. So the result might be fine for the first 500 lines before you find one of the exceptions.

 

geneshackman
Pyrite | Level 9
thanks.
Tom
Super User Tom
Super User

Here is how to share data:

data have;
  infile cards dsd truncover ;
  input cos total underpov region $;
cards;
1,8913,2910,ROS
2,15,2,ROS
58,120347,43223,NYC
3,2923,1423,ROS
4,317,106,ROS
5,338,148,ROS
6,681,428,ROS
7,950,185,ROS
8,58,37,ROS
9,212,56,ROS
10,269,17,ROS
11,220,7,ROS
12,59,15,ROS
13,6108,1195,ROS
14,29917,11954,ROS
15,21,13,ROS
16,71,60,ROS
17,147,147,ROS
18,129,53,ROS
19,128,98,ROS
20,0,0,ROS
21,276,47,ROS
22,1434,459,ROS
59,167871,43213,NYC
23,66,8,ROS
24,206,114,ROS
25,65,31,ROS
26,31252,14751,ROS
27,386,171,ROS
28,32295,2973,ROS
60,41041,15816,NYC
29,3887,1680,ROS
30,3584,2158,ROS
31,15001,8066,ROS
32,490,36,ROS
33,9528,1109,ROS
34,207,63,ROS
35,270,96,ROS
36,200,50,ROS
37,561,40,ROS
61,79161,10967,NYC
38,2347,958,ROS
62,12364,3909,NYC
39,9205,1688,ROS
40,47,15,ROS
41,747,193,ROS
42,4122,1594,ROS
43,50,0,ROS
44,13,5,ROS
45,137,28,ROS
46,353,190,ROS
47,25916,3601,ROS
48,1279,631,ROS
49,53,21,ROS
50,615,288,ROS
51,1744,665,ROS
52,84,15,ROS
53,27,1,ROS
54,432,167,ROS
55,28775,4849,ROS
56,36,15,ROS
57,4,4,ROS
999,647954,182762,NYS
102,420784,117128,
;

Why do you have the 999 and 102 data in there?  Aren't those something you would DERIVE from the actual 62 counties?

proc summary data=have ;
  where cos < 100 ;
  class cos region ;
  types cos region () ;
  var total underpov ;
  output out=want sum= ;
run;

Result

Obs    cos    region    _TYPE_    _FREQ_     total    underpov

  1      .                 0        62      647954     182762
  2      .     NYC         1         5      420784     117128
  3      .     ROS         1        57      227170      65634
  4      1                 2         1        8913       2910
  5      2                 2         1          15          2
  6      3                 2         1        2923       1423
  7      4                 2         1         317        106
  8      5                 2         1         338        148
  9      6                 2         1         681        428
 10      7                 2         1         950        185
 11      8                 2         1          58         37
 12      9                 2         1         212         56
 13     10                 2         1         269         17
 14     11                 2         1         220          7
 15     12                 2         1          59         15
 16     13                 2         1        6108       1195
 17     14                 2         1       29917      11954
 18     15                 2         1          21         13
 19     16                 2         1          71         60
 20     17                 2         1         147        147
 21     18                 2         1         129         53
 22     19                 2         1         128         98
 23     20                 2         1           0          0
 24     21                 2         1         276         47
 25     22                 2         1        1434        459
 26     23                 2         1          66          8
 27     24                 2         1         206        114
 28     25                 2         1          65         31
 29     26                 2         1       31252      14751
 30     27                 2         1         386        171
 31     28                 2         1       32295       2973
 32     29                 2         1        3887       1680
 33     30                 2         1        3584       2158
 34     31                 2         1       15001       8066
 35     32                 2         1         490         36
 36     33                 2         1        9528       1109
 37     34                 2         1         207         63
 38     35                 2         1         270         96
 39     36                 2         1         200         50
 40     37                 2         1         561         40
 41     38                 2         1        2347        958
 42     39                 2         1        9205       1688
 43     40                 2         1          47         15
 44     41                 2         1         747        193
 45     42                 2         1        4122       1594
 46     43                 2         1          50          0
 47     44                 2         1          13          5
 48     45                 2         1         137         28
 49     46                 2         1         353        190
 50     47                 2         1       25916       3601
 51     48                 2         1        1279        631
 52     49                 2         1          53         21
 53     50                 2         1         615        288
 54     51                 2         1        1744        665
 55     52                 2         1          84         15
 56     53                 2         1          27          1
 57     54                 2         1         432        167
 58     55                 2         1       28775       4849
 59     56                 2         1          36         15
 60     57                 2         1           4          4
 61     58                 2         1      120347      43223
 62     59                 2         1      167871      43213
 63     60                 2         1       41041      15816
 64     61                 2         1       79161      10967
 65     62                 2         1       12364       3909
Tom
Super User Tom
Super User

You probably just need a multilabel format.

data have;
  infile cards dsd truncover ;
  input cos total underpov region $;
cards;
1,8913,2910,ROS
2,15,2,ROS
58,120347,43223,NYC
3,2923,1423,ROS
4,317,106,ROS
5,338,148,ROS
6,681,428,ROS
7,950,185,ROS
8,58,37,ROS
9,212,56,ROS
10,269,17,ROS
11,220,7,ROS
12,59,15,ROS
13,6108,1195,ROS
14,29917,11954,ROS
15,21,13,ROS
16,71,60,ROS
17,147,147,ROS
18,129,53,ROS
19,128,98,ROS
20,0,0,ROS
21,276,47,ROS
22,1434,459,ROS
59,167871,43213,NYC
23,66,8,ROS
24,206,114,ROS
25,65,31,ROS
26,31252,14751,ROS
27,386,171,ROS
28,32295,2973,ROS
60,41041,15816,NYC
29,3887,1680,ROS
30,3584,2158,ROS
31,15001,8066,ROS
32,490,36,ROS
33,9528,1109,ROS
34,207,63,ROS
35,270,96,ROS
36,200,50,ROS
37,561,40,ROS
61,79161,10967,NYC
38,2347,958,ROS
62,12364,3909,NYC
39,9205,1688,ROS
40,47,15,ROS
41,747,193,ROS
42,4122,1594,ROS
43,50,0,ROS
44,13,5,ROS
45,137,28,ROS
46,353,190,ROS
47,25916,3601,ROS
48,1279,631,ROS
49,53,21,ROS
50,615,288,ROS
51,1744,665,ROS
52,84,15,ROS
53,27,1,ROS
54,432,167,ROS
55,28775,4849,ROS
56,36,15,ROS
57,4,4,ROS
;
data formats;
  set have end=eof;
  fmtname='REGION';
  hlo='SM';
  start = cos;
  label = region;
  output;
  label = 'NYS';
  output;
  label = 'C' || put(cos,z2.);
  output;
  keep fmtname start label hlo ;
run;

proc sort;
  by start label;
run;

proc format cntlin=formats;
run;

proc summary data=have nway;
  class cos / mlf ;
  format cos region. ;
  var total underpov ;
  output out=want sum= ;
run;

proc print;
run; 
geneshackman
Pyrite | Level 9
Just to mention, I have 999 and 102 because you can get State and NYC from the ACS, so I did. You can't get ROS from ACS.
geneshackman
Pyrite | Level 9

Well, in this results box, cos for ROS, NYC and NYS are blank. My main objective is to get rows that have summary numbers for those three regions AND cos labels.

Tom
Super User Tom
Super User

I would recommend using the multilabel format method.  Ignore the pre calculated summary rows (or at least recalculate them and QC the result against the published numbers).

proc format ;
  value cos (multilabel)
  1-62 = 'NYS'
  1-57 = 'ROS'
  58-62 = 'NYC'
  ;
run;
proc summary data=have nway ;
  class cos / mlf ;
  format cos cos.;
  var total underpov;
  output out=want sum=;
run;
proc print data=want;
run;

Tom_0-1666362646662.png

 

 

geneshackman
Pyrite | Level 9
Thank you Tom, this is exactly what I was looking for.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 24 replies
  • 6425 views
  • 9 likes
  • 5 in conversation