Leading zero's can get dropped, dates can be read incorrectly (mmddyy instead of ddmmyy), numbers longer than 16 digits will have the end digits changed if not imported as character.
@geneshackman wrote:
"If you open a CSV in Excel and then save it to XLSX it will change values."
Really? I didn't know that. What values will change? I'm sure you don't mean it will change a 1 to a 2. Perhaps this is about blanks, missings, like that?
"Leading zero's can get dropped, dates can be read incorrectly (mmddyy instead of ddmmyy), numbers longer than 16 digits will have the end digits changed if not imported as character."
Thanks, good to know!
@geneshackman wrote:
"If you open a CSV in Excel and then save it to XLSX it will change values."
Really? I didn't know that. What values will change? I'm sure you don't mean it will change a 1 to a 2. Perhaps this is about blanks, missings, like that?
Changes can be much more insidious than a digit. Certain values that may start as 5-10, such as possibly a part number, can become a date value for May 10 in the current year. Which is then saved internally in the sheet as the number of days since 1 Jan 1900.
If you have values like account or banking numbers that have leading 0 that are critical Excel will try to make them numbers.
So account numbers of:
0005
000005
which would be important in your accounting software would both get reduced to 5. If there are enough digits it may attempt to convert to numeric but run into storage precision limits and then round the values.
Other values with things could be interpreted as apply an operator like subtraction or addition, convert to a percentage or who knows what.
I seem to remember seeing some cases of values in () getting treated as negative values.
And these behaviors will be on a cell-by-cell basis. So the result might be fine for the first 500 lines before you find one of the exceptions.
Here is how to share data:
data have;
infile cards dsd truncover ;
input cos total underpov region $;
cards;
1,8913,2910,ROS
2,15,2,ROS
58,120347,43223,NYC
3,2923,1423,ROS
4,317,106,ROS
5,338,148,ROS
6,681,428,ROS
7,950,185,ROS
8,58,37,ROS
9,212,56,ROS
10,269,17,ROS
11,220,7,ROS
12,59,15,ROS
13,6108,1195,ROS
14,29917,11954,ROS
15,21,13,ROS
16,71,60,ROS
17,147,147,ROS
18,129,53,ROS
19,128,98,ROS
20,0,0,ROS
21,276,47,ROS
22,1434,459,ROS
59,167871,43213,NYC
23,66,8,ROS
24,206,114,ROS
25,65,31,ROS
26,31252,14751,ROS
27,386,171,ROS
28,32295,2973,ROS
60,41041,15816,NYC
29,3887,1680,ROS
30,3584,2158,ROS
31,15001,8066,ROS
32,490,36,ROS
33,9528,1109,ROS
34,207,63,ROS
35,270,96,ROS
36,200,50,ROS
37,561,40,ROS
61,79161,10967,NYC
38,2347,958,ROS
62,12364,3909,NYC
39,9205,1688,ROS
40,47,15,ROS
41,747,193,ROS
42,4122,1594,ROS
43,50,0,ROS
44,13,5,ROS
45,137,28,ROS
46,353,190,ROS
47,25916,3601,ROS
48,1279,631,ROS
49,53,21,ROS
50,615,288,ROS
51,1744,665,ROS
52,84,15,ROS
53,27,1,ROS
54,432,167,ROS
55,28775,4849,ROS
56,36,15,ROS
57,4,4,ROS
999,647954,182762,NYS
102,420784,117128,
;
Why do you have the 999 and 102 data in there? Aren't those something you would DERIVE from the actual 62 counties?
proc summary data=have ;
where cos < 100 ;
class cos region ;
types cos region () ;
var total underpov ;
output out=want sum= ;
run;
Result
Obs cos region _TYPE_ _FREQ_ total underpov 1 . 0 62 647954 182762 2 . NYC 1 5 420784 117128 3 . ROS 1 57 227170 65634 4 1 2 1 8913 2910 5 2 2 1 15 2 6 3 2 1 2923 1423 7 4 2 1 317 106 8 5 2 1 338 148 9 6 2 1 681 428 10 7 2 1 950 185 11 8 2 1 58 37 12 9 2 1 212 56 13 10 2 1 269 17 14 11 2 1 220 7 15 12 2 1 59 15 16 13 2 1 6108 1195 17 14 2 1 29917 11954 18 15 2 1 21 13 19 16 2 1 71 60 20 17 2 1 147 147 21 18 2 1 129 53 22 19 2 1 128 98 23 20 2 1 0 0 24 21 2 1 276 47 25 22 2 1 1434 459 26 23 2 1 66 8 27 24 2 1 206 114 28 25 2 1 65 31 29 26 2 1 31252 14751 30 27 2 1 386 171 31 28 2 1 32295 2973 32 29 2 1 3887 1680 33 30 2 1 3584 2158 34 31 2 1 15001 8066 35 32 2 1 490 36 36 33 2 1 9528 1109 37 34 2 1 207 63 38 35 2 1 270 96 39 36 2 1 200 50 40 37 2 1 561 40 41 38 2 1 2347 958 42 39 2 1 9205 1688 43 40 2 1 47 15 44 41 2 1 747 193 45 42 2 1 4122 1594 46 43 2 1 50 0 47 44 2 1 13 5 48 45 2 1 137 28 49 46 2 1 353 190 50 47 2 1 25916 3601 51 48 2 1 1279 631 52 49 2 1 53 21 53 50 2 1 615 288 54 51 2 1 1744 665 55 52 2 1 84 15 56 53 2 1 27 1 57 54 2 1 432 167 58 55 2 1 28775 4849 59 56 2 1 36 15 60 57 2 1 4 4 61 58 2 1 120347 43223 62 59 2 1 167871 43213 63 60 2 1 41041 15816 64 61 2 1 79161 10967 65 62 2 1 12364 3909
You probably just need a multilabel format.
data have;
infile cards dsd truncover ;
input cos total underpov region $;
cards;
1,8913,2910,ROS
2,15,2,ROS
58,120347,43223,NYC
3,2923,1423,ROS
4,317,106,ROS
5,338,148,ROS
6,681,428,ROS
7,950,185,ROS
8,58,37,ROS
9,212,56,ROS
10,269,17,ROS
11,220,7,ROS
12,59,15,ROS
13,6108,1195,ROS
14,29917,11954,ROS
15,21,13,ROS
16,71,60,ROS
17,147,147,ROS
18,129,53,ROS
19,128,98,ROS
20,0,0,ROS
21,276,47,ROS
22,1434,459,ROS
59,167871,43213,NYC
23,66,8,ROS
24,206,114,ROS
25,65,31,ROS
26,31252,14751,ROS
27,386,171,ROS
28,32295,2973,ROS
60,41041,15816,NYC
29,3887,1680,ROS
30,3584,2158,ROS
31,15001,8066,ROS
32,490,36,ROS
33,9528,1109,ROS
34,207,63,ROS
35,270,96,ROS
36,200,50,ROS
37,561,40,ROS
61,79161,10967,NYC
38,2347,958,ROS
62,12364,3909,NYC
39,9205,1688,ROS
40,47,15,ROS
41,747,193,ROS
42,4122,1594,ROS
43,50,0,ROS
44,13,5,ROS
45,137,28,ROS
46,353,190,ROS
47,25916,3601,ROS
48,1279,631,ROS
49,53,21,ROS
50,615,288,ROS
51,1744,665,ROS
52,84,15,ROS
53,27,1,ROS
54,432,167,ROS
55,28775,4849,ROS
56,36,15,ROS
57,4,4,ROS
;
data formats;
set have end=eof;
fmtname='REGION';
hlo='SM';
start = cos;
label = region;
output;
label = 'NYS';
output;
label = 'C' || put(cos,z2.);
output;
keep fmtname start label hlo ;
run;
proc sort;
by start label;
run;
proc format cntlin=formats;
run;
proc summary data=have nway;
class cos / mlf ;
format cos region. ;
var total underpov ;
output out=want sum= ;
run;
proc print;
run;
Well, in this results box, cos for ROS, NYC and NYS are blank. My main objective is to get rows that have summary numbers for those three regions AND cos labels.
I would recommend using the multilabel format method. Ignore the pre calculated summary rows (or at least recalculate them and QC the result against the published numbers).
proc format ;
value cos (multilabel)
1-62 = 'NYS'
1-57 = 'ROS'
58-62 = 'NYC'
;
run;
proc summary data=have nway ;
class cos / mlf ;
format cos cos.;
var total underpov;
output out=want sum=;
run;
proc print data=want;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.