I have run into another issue. "Uddannelsescenter Nygård - 10.klasseskolen" appears in the RECORD column, where it instead should appear in the NAME column with "10. klasse" in the RECORD column. As you can see in the picture and code SAS also generates a new row (748) with "Måløhøj skole - Østerhøj" in the NAME column and "10. klasse" in the RECORD column, which I don't want because it doesn't exist in the file I imported from excel.
I think the issue SAS has is that the ID number is the same for the two schools as well as a number appearing in the NAME column. I want the output to be that
"151013 Måløhøjvej Skole - Østerhøj" runs from 0-9 klasse (grade) followed by a row with "151013 Uddannelsescenter Nygård - 10.klasseskolen" with "10.klasse" in the RECORD column.
2010,151012,Skovvejens Skole - Øst,2. klasse,330,0,0,0,30,0,0,150,0,30,0,0,60,30,0,120,30,0,0
2010,151012,Skovvejens Skole - Øst,3. klasse,240,60,0,0,30,0,0,150,0,60,0,0,60,30,0,120,30,0,0
2010,151012,Skovvejens Skole - Øst,4. klasse,180,60,0,0,30,0,0,120,0,60,0,0,30,60,60,90,30,0,0
2010,151012,Skovvejens Skole - Øst,5. klasse,180,90,0,0,30,0,0,120,0,60,0,0,30,30,90,90,30,0,0
2010,151012,Skovvejens Skole - Øst,6. klasse,180,90,0,0,30,0,0,120,0,60,0,0,0,30,120,60,30,0,0
2010,151012,Skovvejens Skole - Øst,7. klasse,180,90,90,90,30,0,0,120,60,0,60,30,0,0,90,60,30,0,0
2010,151012,Skovvejens Skole - Øst,8. klasse,180,90,120,120,0,0,60,120,60,0,60,60,0,0,0,60,30,0,60
2010,151012,Skovvejens Skole - Øst,9. klasse,150,90,120,120,30,0,60,120,90,0,30,30,0,0,0,60,30,0,60
2010,151013,Måløvhøj Skole - Østerhøj,0. klasse,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,750,0
2010,151013,Måløvhøj Skole - Østerhøj,1. klasse,330,0,0,0,60,0,0,150,0,30,0,0,30,30,0,120,30,0,0
2010,151013,Måløvhøj Skole - Østerhøj,2. klasse,330,0,0,0,30,0,0,150,0,30,0,0,60,30,0,120,30,0,0
2010,151013,Måløvhøj Skole - Østerhøj,3. klasse,240,60,0,0,30,0,0,150,0,60,0,0,60,30,0,120,30,0,0
2010,151013,Måløvhøj Skole - Østerhøj,4. klasse,180,60,0,0,30,0,0,120,0,60,0,0,30,60,60,90,30,0,0
2010,151013,Måløvhøj Skole - Østerhøj,5. klasse,180,90,0,0,30,0,0,120,0,60,0,0,60,30,90,90,30,0,0
2010,151013,Måløvhøj Skole - Østerhøj,6. klasse,180,90,0,0,30,0,0,120,0,60,0,0,0,30,120,60,30,0,0
2010,151013,Måløvhøj Skole - Østerhøj,7. klasse,180,90,90,0,30,0,0,120,60,0,60,30,0,0,90,60,30,0,0
2010,151013,Måløvhøj Skole - Østerhøj,8. klasse,180,90,120,0,0,0,60,120,60,0,60,60,0,0,0,60,30,0,0
2010,151013,Måløvhøj Skole - Østerhøj,9. klasse,150,90,120,0,30,0,60,120,90,0,30,30,0,0,0,60,30,0,60
2010,153013,Måløvhøj Skole - Østerhøj,Uddannelsescenter Nygård - 10.Klasseskolen,180,120,120,120,0,0,60,120,60,0,0,0,0,0,0,0,20,0,180
2010,153013,Måløvhøj Skole - Østerhøj,10. klasse,180,120,120,120,0,0,60,120,60,0,0,0,0,0,0,0,20,0,180
2010,153018,Brøndbyøster Skole,0. klasse,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,750,0
2010,153018,Brøndbyøster Skole,1. klasse,330,0,0,0,60,0,0,150,0,30,0,0,45,45,0,60,30,0,0
2010,153018,Brøndbyøster Skole,2. klasse,300,0,0,0,30,0,0,150,0,60,0,0,60,60,0,60,30,0,0
Additionally I used this code to generate the NAME column:
data want;
set have (keep=Skole);
retain flag 0 ID Name;
if anyalpha(skole)=0 and find(skole, '.')=0 then
ID=skole;
else if anydigit(skole)=0 then
Name=skole;
else
do;
Record=Skole;
output;
end;
keep name record ID;
run;
Export the spreadsheet to a csv file, and post ({i} button or attachment) that here.
Or is your second data example a part of that csv? If yes, then the data is what it is, and you need to fix it in the source.
Like this?
My second data example is how it looks after i used the SAS code below the example.
Ah, I see.
Modify your code for setting the three retained variables:
data want;
retain ID Name record;
set have;
if anyalpha(skole) = 0
then ID = skole;
else if find(skole,'.') = 0 or find(skole,'-') > 0
then Name = skole;
else do;
Record = Skole;
output;
end;
drop skole;
run;
Where did you get the year from? That's the only thing missing from my result.
PS the code used to read your csv:
data have;
infile
'$HOME/sascommunity/Pt20101.csv'
encoding='utf-8'
termstr=CR
dlm=';'
firstobs=2
truncover
dsd
;
input
Skole :$50.
bil
bio
bhk
dan
eng
fra
fk
geo
his
hsh
idr
kt
kri
kri_rel
mat
mus
nt
sam
tys
vf
;
run;
I created the variable year myself. I tried reading the code you wrote, but I still don't get the output that I wan't.
When I use the code I posted first (below with a few alterations, but nothing important), I almost get the right output. It seems like SAS don't put schools with a name that contains a number in the SKOLETX column.
data pt_2010;
set prove_2010;
array change _numeric_;
do over change;
if change=. then change=0;
end;
aar=2010.;
retain flag 0 skolekode skoletx;
if anyalpha(skole)=0 and find(skole, '.')=0 then
skolekode=skole;
else if anydigit(skole)=0 then
skoletx=skole;
else
do;
kltrin=Skole;
output;
end;
label Name=skoletx;
drop Skole flag;
run;
data pt_2010check;
set PT_2010;
where kltrin not like '%klasse';
run;
I use the second data step to see if the KLTRIN column has any observations that aren't equal to "klasse" in the usual form e.g. "2. klasse" or "3. klasse" etc. I get the following:
data ARBEJDE.CHECK2010_1;
infile datalines dsd truncover;
input aar:32. skolekode:$64. skoletx:$64. kltrin:$64. dan:BEST. eng:BEST. tys:BEST. fra:BEST. kri:BEST. kri_rel:BEST. sam:BEST. mat:BEST. fk:BEST. nt:BEST. bio:BEST. geo:BEST. bil:BEST. mus:BEST. hsh:BEST. idr:BEST. kt:BEST. bhk:BEST. vf:BEST.;
datalines4;
2010,153013,Måløvhøj Skole - Østerhøj,Uddannelsescenter Nygård - 10.Klasseskolen,180,120,120,120,0,0,60,120,60,0,0,0,0,0,0,0,20,0,180
2010,159038,Skovbrynet skole,"GXU - Gladsaxe 10. Klasse og Ungdomsskole, 10. klasse afdelingen",171,114,114,114,0,0,57,114,57,0,0,0,0,0,0,0,33,0,297
2010,165017,Hyldagerskolen,Det 10. Element Albertslund 10. kl. skole,332,125,81,0,0,0,70,125,60,6,0,0,0,15,0,96,20,0,0
2010,169021,Torstorp Skole,LINIE10,180,180,120,0,0,0,0,180,60,0,0,0,0,0,0,0,60,0,240
2010,183011,Vejlebroskolen,Ishøj 10.klassecenter,224,168,112,0,0,0,0,168,42,0,0,0,0,0,0,14,0,0,78
2010,217033,Byskolen,10.Klasseskolen,180,120,90,90,0,30,30,120,90,0,0,0,0,0,0,0,30,0,330
2010,219024,"Grønnevang Skole, afdeling Østervang",10. klasseskolen,160,120,0,0,0,0,0,160,0,0,0,0,0,0,0,0,40,0,420
2010,269012,Munkekærskolen,10Solrød,145,95,95,95,0,0,45,120,45,0,0,0,0,0,0,45,45,0,240
2010,333021,Antvorskov Skole,Xclass. 10. Klasse-centret i Slagelse,188.57,160.71,102.86,102.86,0,0,51.43,128.57,60,0,38.57,0,141.43,0,0,357.86,122.14,0,51.43
2010,373028,"Kobberbakkeskolen, afd. Sct. Jørgen",10'eren,391,129,99,0,0,0,0,121,50,0,0,0,0,0,0,0,0,0,66
2010,400036,"Hans Rømer Skolen, Afdeling Aaker",10. klasseskolen,180,120,120,0,0,0,0,120,60,0,0,0,0,0,0,60,30,0,390
2010,495005,Juulskovskolen,4kløverskolen,200.2173913,77.5,71.630434783,0,26.847826087,0,25.869565217,125.2173913,44.239130435,18.043478261,34.02173913,26.086956522,13.804347826,15.543478261,37.5,63.47826087,28.260869565,690,25.869565217
2010,515021,Favrdalskolen,10. ved Kløften,150,120,120,0,0,0,0,150,120,0,0,0,0,0,0,0,20,0,180
2010,541010,Tønder Grundskole,Tønder10,160,120,120,0,0,0,60,160,60,0,60,0,0,0,0,60,0,0,60
2010,545016,Høje Kolstrup Skole,10. klasse Aabenraa,180,120,120,0,0,0,0,120,120,0,0,0,0,0,0,0,0,0,240
2010,565014,Grindsted Vestre Skole,10. Klasse Skolen,150,120,120,0,0,0,120,150,90,0,60,0,120,0,120,270,59,0,120
2010,615029,"Horsens Byskole, folkeskoleafdeling","Learnmark Horsens, Step 10",180,120,30,0,0,0,90,150,30,0,0,0,0,30,30,30,30,0,120
2010,631034,Hældagerskolen,10. Klasse UngdomsCenter Vejle,157,99,95,0,0,20,35,103,103,0,0,0,0,0,0,96,20,0,130
2010,707011,Åboulevarden,10. Klasse Center Djursland,174,133,0,0,0,0,60,173,0,0,0,0,0,0,0,0,20,0,280
2010,751116,Kolt Skole,"Center-10, Aarhus High School",150.26344086,114.74731183,0,0,0,0,1.8897849462,147.95698925,55.913978495,0,0,0,0,0,0,10.145698925,0,0,352.6688172
2010,779017,"Grundskoleafdelingen, Ådalskolen",10. Klasse Center Skive,210,105,92,0,0,0,92,105,92,0,92,0,0,0,0,120,0,0,120
2010,785015,Vestervig Skole,Campus 10,150,120,90,0,0,0,0,120,60,0,0,0,0,0,0,0,30,0,420
2010,813211,Elling Skole,Frederikshavn Kommunale Ungdomsskole og 10. klassecenter,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2010,820001,Ulvkærskolen,10. KlasseCentret Vesthimmerland,198,132,111,0,0,15,0,177,69,0,0,0,0,30,0,30,75,0,276
2010,821019,Lundergård undervisningssted,Hjørring Ny 10.,170,105,100,0,0,0,40,145,60,0,0,0,0,0,0,40,298,0,40
2010,823006,Onsild Skole,Hobro Søndre Skole og MF10,218.20093458,71.705607477,56.425233645,0,30.981308411,0,20.747663551,132.19626168,36.168224299,29.299065421,25.23364486,20.817757009,21.238317757,23.971962617,28.808411215,66.939252336,30,840,31.401869159
2010,823011,Bymarkskolen,Mariagerfjord 10. klassecenter,180,120,120,0,0,0,0,120,60,0,0,0,0,0,0,0,0,0,420
;;;;
I want the third column to be replaced by the fourth and the fourth column should only contain "10. klasse" for the above observations e.g. "2010, 823011, Mariagerfjord 10. klassecenter, 10. klasse".
I added an additional condition that seems to take care of most of your problems:
data have;
infile
'$HOME/sascommunity/Pt20101.csv'
encoding='utf-8'
termstr=CR
dlm=';'
firstobs=2
truncover
dsd
;
input
Skole :$50.
bil
bio
bhk
dan
eng
fra
fk
geo
his
hsh
idr
kt
kri
kri_rel
mat
mus
nt
sam
tys
vf
;
run;
data want;
retain ID Name record;
set have;
if anyalpha(skole) = 0
then ID = skole;
else if find(skole,'.') = 0 or find(skole,'-') > 0 or anydigit(substr(skole,1,1)) = 0
then Name = skole;
else do;
Record = Skole;
output;
end;
drop skole;
run;
proc freq data=want;
tables record;
run;
The result from the freq now only shows some very peculiar entries that might even be valid:
--------------------------------------------- 0. klasse 1457 1. klasse 1464 10. Klasse Center Djursland 1 10. Klasse Center Skive 1 10. Klasse Skolen 1 10. Klasse UngdomsCenter Vejle 1 10. KlasseCentret Vesthimmerland 1 10. klasse 180 10. klasse Aabenraa 1 10. klasseskolen 2 10. ved Kløften 1 10.Klasseskolen 1 2. klasse 1460 3. klasse 1464 4. klasse 1460 5. klasse 1457 6. klasse 1450 7. klasse 1136 8. klasse 1060 9. klasse 1057
This might even give you material for going back and cleaning the original data.
It took care of most of my problems. I now only need to get the names for the schools below in the RECORD column moved to the NAME column and have "10. klasse" in the RECORD column instead. Is it possible to modify the code in order to do this?
Can you make up a simple rule for catching those instances? And are the "names" valid at all?
I'd throw that back to the people responsible for the source so that they can clean out their data.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.