Sort 函数介绍
[list]
[*]Definition and syntax
[*]Sort by numerical order
[*]Sort by ASCII order
[*]Sort by dictionary order
[*]Sort by reverse order
[*]Sort using multiple keys
[*]Generate an array of sort ranks
[*]Sort a hash by its keys (sort + map)
[*]Sort a hash by its values (sort + map)
[*]Sort words in a file and eliminate duplicates
[*]Efficient sorting (sort + map)
[*]Sorting lines on the last field (sort + map)
[*]More efficient sorting (sort + map)
[*]CPAN modules for sorting
[*]Sort words with 4 or more consecutive vowels (sort map grep)
[*]Print the canonical sort order (sort map grep)
[/list] The sort function
sort LIST
sort BLOCK LIST
sort SUBNAME LIST The sort function sorts the LIST and returns the sorted list value. If SUBNAME or BLOCK is omitted, the sort is in standard string comparison order (i.e., an ASCII sort). If SUBNAME is specified, it gives the name of a subroutine that compares two list elements and returns an integer less than, equal to, or greater than 0, depending on whether the elements are in ascending sort order, equal, or descending sort order, respectively. In place of a SUBNAME, you can provide a BLOCK as an anonymous, in-line sort subroutine.
sort函数的作用就是对LIST排序然后返回一个排好序的列表。如果没有SUBNAME或者BLOCK,那么sort函数按照标准的字符串比较顺 序来排序(即ASCII排序)。如果指定了SUBNAME,也就是说,指定一个子程序名来比较列表中的两个元素,根据它们的关系是升序排列、相等,还是降 序排列,来返回一个小于0的整数值、0,大于0的整数值。也可以用一个匿名的、内置有排序的子程序块BLOCK来代替SUBNAME。
The two elements to be compared are passed as the variables $a and $b. They are passed by reference, so don’t modify $a or $b. If you use a subroutine, it cannot be recursive.
用于比较的两个元素分别作为$a和$b传递给SUBNAME或BLOCK。它们是作为引用传递的,所以千万不要修改$a或者$b。如果使用子程序的话,它不能是递归的。
Perl versions prior to 5.6 use the Quicksort algorithm for sorting; version 5.6 and higher use the slightly more efficient Mergesort algorithm. Both are efficient N log N algorithms.
在5.6版本之前,perl使用快速排序算法来排序;5.6之后的版本使用略微高效的归并排序算法。这两种算法都是高效的时间复杂度为Nlog N的算法。?
Sort by numerical order
按数字顺序排序
@array = (8, 2, 32, 1, 4, 16);
print join(' ', sort { $a <=> $b } @array), "\n";
输出结果:
1 2 4 8 16 32 Equivalently:
相同结果地:
sub numerically { $a <=> $b }
print join(' ', sort numerically @array), "\n";
输出结果:
1 2 4 8 16 32 Sort by ASCII order (not dictionary order)
按ASCII顺序排序(不是字典顺序)
@languages = qw(fortran lisp c c++ Perl python java);
print join(’ ’, sort @languages), ”\n”;
Perl c c++ fortran java lisp python
Equivalently:
相同结果地:
print join(' ', sort { $a cmp $b } @languages), "\n"; Watch out if you have some numbers in the data:
如果数据中有数字的话要小心
print join(' ', sort 1 .. 11), "\n";
输出结果:
1 10 11 2 3 4 5 6 7 8 9 Sort by dictionary order
按照字典顺序
use locale;
@array = qw(ASCII ascap at_large atlarge A ARP arp);
@sorted = sort { ($da = lc $a) =~ s/[\W_]+//g;
($db = lc $b) =~ s/[\W_]+//g;
$da cmp $db;
} @array;
print "@sorted\n";
输出结果:
A ARP arp ascap ASCII atlarge at_large The use locale pragma is optional - it makes the code more portable if the data contains international characters. This pragma affects the operators cmp, lt, le, ge, gt and some functions - see the perllocale man page for details.
编译指示use locale是可选的 - 如果数据中包含国际字符的话,这会让代码更加便携。这个编译指示会影响操作符cmp,lt,le,ge,gt和一些函数 - 细节参看perllocal的man页。
Notice that the order of atlarge and at_large was reversed on output, even though their sort order was identical. (The substitution removed the underscore before the sort.) This happened because this example was run using Perl version 5.005_02. Before Perl version 5.6, the sort function would not preserve the order of keys with identical values. The sort function in Perl versions 5.6 and higher preserves this order. (A sorting algorithm that preserves the order of identical keys is called “stable”.)
注意,上面的例子中,atlarge和at_large的位置在结果中反过来了,尽管它们排序的顺序是相同的。(子程序在排序之前移除了下划线。) 位置之所以会反过来,是因为这个程序当前正运行在perl5.005_02之下。在5.6版本之前,对于值相同的键,sort函数是不会保留它们的顺序 的。而5.6之后的版本,sort函数则会保留这种顺序。(这种保留相同值的排列顺序的算法被称之为“固定”。)
Sort by reverse order
逆序排序
To reverse the sort order, reverse the position of the operands of the cmp or <=> operators:
为了做逆序排序,只需要把操作符<=>两边的操作数位置调换即可。
sort { $b <=> $a } @array; or change the sign of the block’s or subroutine’s return value:
或者改变程序块儿或子程序返回值的符号:
sort { -($a <=> $b) } @array; or add the reverse function (slightly less efficient, but perhaps more readable):
或者添加reverse函数(效率低一些,但是可读性更强):
reverse sort { $a <=> $b } @array; Sort using multiple keys
利用多键排序
To sort by multiple keys, use a sort subroutine with multiple tests connected by Perl’s or operator. Put the primary sort test first, the secondary sort test second, and so on.
为了用多键排序,用来排序的子程序需要用or操作符来连接多个测试,第一个键的测试在前,第二个键的测试在后,依此类推。
# An array of references to anonymous hashes
# 一个匿名哈希表的数组引用
@employees = (
{ FIRST => 'Bill', LAST => 'Gates',
SALARY => 600000, AGE => 45 },
{ FIRST => 'George', LAST => 'Tester'
SALARY => 55000, AGE => 29 },
{ FIRST => 'Steve', LAST => 'Ballmer',
SALARY => 600000, AGE => 41 }
{ FIRST => 'Sally', LAST => 'Developer',
SALARY => 55000, AGE => 29 },
{ FIRST => 'Joe', LAST => 'Tester',
SALARY => 55000, AGE => 29 },
);
sub seniority {
$b->{SALARY} <=> $a->{SALARY}
or $b->{AGE} <=> $a->{AGE}
or $a->{LAST} cmp $b->{LAST}
or $a->{FIRST} cmp $b->{FIRST}
}
@ranked = sort seniority @employees;
foreach $emp (@ranked) {
print "$emp->{SALARY}\t$emp->{AGE}\t$emp->{FIRST}
$emp->{LAST}\n";
}
输出结果:
600000 45 Bill Gates
600000 41 Steve Ballmer
55000 29 Sally Developer
55000 29 George Tester
55000 29 Joe Tester Generate an array of sort ranks for a list
针对列表部分范围进行排序,返回数组。
@x = qw(matt elroy jane sally);
@rank[sort { $x[$a] cmp $x[$b] } 0 .. $#x] = 0 .. $#x;
print "@rank\n";
输出结果:
2 0 1 3 Sort a hash by its keys
针对键来对哈希表排序
Hashes are stored in an order determined by Perl’s internal hashing algorithm; the order changes when you add or delete a hash key. If you need to sort a hash, consider storing the data in an array instead and using the preceding recipe (generate an array of sort ranks). Alternatively, you can sort a hash by key value and store the results in an array in which each element is a reference to a hash containing only one key/value pair:
哈希表的存储是根据perl内部的哈希算法决定的顺序来排列的;当你添加或者删除一个哈希键,则该顺序将会发生改变。如果你需要对一个哈希排序,那 么最好将数据存储在数组中,然后使用前面提到的方法(针对列表部分范围进行排序)。或者,你可以根据哈希表的键来排序,并将结果存储在一个数组中,在该数 组中,每个元素都将是哈希表中一个键/值对儿的引用:
%hash = (Donald => Knuth, Alan => Turing, John => Neumann);
@sorted = map { { ($_ => $hash{$_}) } } sort keys %hash;
foreach $hashref (@sorted) {
($key, $value) = each %$hashref;
print "$key => $value\n";
}
输出结果:
Alan => Turing
Donald => Knuth
John => Neumann Sort a hash by its values
Unlike hash keys, hash values are not guaranteed to be unique. If you sort a hash by only its values, the sort order of two elements with the same value may change when you add or delete other values. To do a stable sort by hash values, do a primary sort by value and a secondary sort by key:
与哈希表中的键不通,哈希表中的值没有唯一性。如果你只根据值来排序的化,那么当添加或者删除其他值的时候,相同值的两个元素的排序顺序将会发生改变。为了根据哈希值来作一个稳定的排序,需要先根据值来做第一次排序,然后根据键再做第二次排序:
%hash = ( Elliot => Babbage,
Charles => Babbage,
Grace => Hopper,
Herman => Hollerith
);
@sorted = map { { ($_ => $hash{$_}) } }
sort { $hash{$a} cmp $hash{$b}
or $a cmp $b
} keys %hash;
foreach $hashref (@sorted) {
($key, $value) = each %$hashref;
print "$key => $value\n";
}
输出结果:
Charles => Babbage
Elliot => Babbage
Herman => Hollerith
Grace => Hopper (Elliot Babbage was Charles’s younger brother. He died in abject poverty after numerous attempts to use Charles’s Analytical Engine to predict horse races. And did I tell you about Einstein’s secret life as a circus performer?)
(Elliot Babbage 是 Charles 的小弟弟。他在多次尝试用查理斯分析机预报赛马结果之后死于贫苦之中。我告诉过你爱因斯坦曾当过马戏团演员的故事吗?)
Sort words in a file and eliminate duplicates
对文件中的单词进行排序,并去掉重复的单词。
This Unix “one-liner” displays a sorted list of the unique words in a file. (The \ escapes the following newline so that Unix sees the command as a single line.) The statement uniq{@F} = () uses a hash slice to create a hash whose keys are the unique words in the file; it is semantically equivalent to $uniq{ $F[0], $F[1], ... $F[$#F] } = (); Hash slices have a confusing syntax that combines the array prefix symbol with the hash key symbols {}, but they solve this problem succinctly.
下面这个unix俳句将打印出一个文件中的出现的唯一的单词的列表。(\ 符号忽略新行符,所以unix认为这两行实际是一行程序。)语句uniq{@F} = () 使用一个哈希片断来创造一个哈希表,这个哈希表的键就是文件中唯一的单词;在语义上,它相当于 $uniq{$F0,$F1, ... $F[$#F]} = (); 哈希片断有一个相当容易让人迷惑的语法,就是把数组的标志@和哈希表符号{}组合在一起,但这样处理更加简洁。
perl -0777ane '$, = "\n"; \
@uniq{@F} = (); print sort keys %uniq' file
输出结果:
-0777 - Slurp entire file instead of single line
-a - Autosplit mode, split lines into @F array
-e - Read scrīpt from command line
-n - Loop over scrīpt specified on command line: while (<>) { ... }
$, - Output field separator used by print function
file - File name Efficient sorting: the Orcish maneuver and the Schwartzian transform
The sort subroutine will usually be called many times for each key. If the run time of the sort is critical, use the Orcish maneuver or Schwartzian transform so that each key is evaluated only once.
Consider an example where evaluating the sort key involves a disk access: sorting a list of file names by file modification date.
[list=1]
[*]Brute force - multiple disk accesses for each file
@sorted = sort { -M $a <=> -M $b } @filenames;
[/list]# Cache keys in a hash (the Orcish maneuver)
@sorted = sort { ($modtimes{$a} ||= -M $a) <=>
($modtimes{$b} ||= -M $b)
} @filenames; The Orcish maneuver was popularized by Joseph Hall. The name is derived from the phrase “or cache”, which is the term for the ”||=” operator. (Orcish may also be a reference to the Orcs, a warrior race from the Lord of the Rings trilogy.)
# Precompute keys, sort, extract results
# (The famous Schwartzian transform)
@sorted = map( { $_->[0] }
sort( { $a->[1] <=> $b->[1] }
map({ [$_, -M] } @filenames)
)
); The Schwartzian transform is named for its originator, Randall Schwartz, who is the co-author of the book Learning Perl.
Benchmark results (usr + sys time, as reported by the Perl Benchmark module):
Linux 2.2.14, 333 MHz CPU, 1649 files:
Brute force: 0.14 s Orcish: 0.07 s Schwartzian: 0.09 s
WinNT 4.0 SP6, 133 MHz CPU, 1130 files:
Brute force: 7.38 s Orcish: 1.43 s Schwartzian: 1.38 s The Orcish maneuver is usually more difficult to code and less elegant than the Schwartzian transform. I recommend that you use the Schwartzian transform as your method of choice.
Also, remember these basic rules of optimization: (1) Don’t do it (or don’t do it yet), (2) Make it right before you make it faster, and (3) Make it clear before you make it faster.
Sorting lines on the last field (Schwartzian transform)
Given $str with the contents below (each line is terminated by the newline character, \n):
eir 11 9 2 6 3 1 1 81% 63% 13
oos 10 6 4 3 3 0 4 60% 70% 25
hrh 10 6 4 5 1 2 2 60% 70% 15
spp 10 6 4 3 3 1 3 60% 60% 14
Sort the contents using the last field as the sort key.
$str = join "\n",
map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [ $_, (split)[-1] ] }
split /\n/, $str;
Another approach is to install and use the CPAN module Sort::Fields. For example:
use Sort::Fields;
@sorted = fieldsort [ 6, '2n', '-3n' ] @lines; This statement uses the default column definition, which is a split on whitespace.
Primary sort is an alphabetic (ASCII) sort on column 6.
Secondary sort is a numeric sort on column 2.
Tertiary sort is a reverse numeric sort on column 3.
Efficient sorting revisited: the Guttman-Rosler transform
Given that the Schwartzian transform is the usually best option for efficient sorting in Perl, is there any way to improve on it? Yes, there is! The Perl sort function is optimized for the default sort, which is in ASCII order. The Guttman-Rosler transform (GRT) does some additional work in the enclosing map functions so that the sort is done in the default order. The Guttman-Rosler transform was first published by Michal Rutka and popularized by Uri Guttman and Larry Rosler.
Consider an example where you want to sort an array of dates. Given data in the format YYYY/MM/DD:
@dates = qw(2001/1/1 2001/07/04 1999/12/25); what is the most efficient way to sort it in order of increasing date?
The straightforward Schwartzian transform would be:
@sorted = map { $_->[0] }
sort { $a->[1] <=> $b->[1]
or $a->[2] <=> $b->[2]
or $a->[3] <=> $b->[3]
}
map { [ $_, split m> $_, 3 ] } @dates; The more efficient Guttman-Rosler transform is:
@sorted = map { substr $_, 10 }
sort
map { m|(\d\d\d\d)/(\d+)/(\d+)|;
sprintf "%d-%02d-%02d%s", $1, $2, $3, $_
} @dates; The GRT solution is harder to code and less readable than the Schwartzian transform solution, so I recommend you use the GRT only in extreme circumstances. For tests using large datasets, Perl 5.005_03 and Linux 2.2.14, the GRT was 1.7 times faster than the Schwartzian transform. For tests using Perl 5.005_02 and Windows NT 4.0 SP6, the GRT was 2.5 times faster. Using the timing data from the first efficiency example, the Guttman-Rosler transform was faster than a brute force sort by a factor of 2.7 and 13 on Linux and Windows, respectively.
If you are still not satisfied, you may be able to speed up your sorts by upgrading to a more recent version of Perl. The sort function in Perl versions 5.6 and higher uses the Mergesort algorithm, which (depending on the data) is slightly faster than the Quicksort algorithm used by the sort function in previous versions.
Again, remember these basic rules of optimization: (1) Don’t do it (or don’t do it yet), (2) Make it right before you make it faster, and (3) Make it clear before you make it faster.
Some CPAN modules for sorting
These modules can be downloaded from http://search.cpan.org/.
File::Sort - Sort one or more text files by lines
Sort::Fields - Sort lines using one or more columns as the sort key(s)
Sort::ArbBiLex - Construct sort functions for arbitrary sort orders
Text::BibTeX::BibSort - Generate sort keys for bibliographic entries.
Sort::Maker - Automatic generation of ST, GRT, etc. sorters.
Sort::Key - The fastest way to sort in Perl. The open source community is adding to CPAN regularly - use the search engine to check for new modules. If one of the CPAN sorting modules can be modified to suit your needs, contact the author and/or post your idea to the Usenet group comp.lang.perl.modules. If you write a useful, generalized sorting module, please contribute it to CPAN!
The holy grail of Perl data transforms
Challenge: Find a practical problem that can be
effectively solved by a statement using map, sort and grep. The code should run faster and be more compact than alternative solutions.
Print sorted list of words with 4 or more consecutive vowels
perl -e 'print sort map { uc } grep /[aeiou]{4}/, <>' \
/usr/dict/words
输出结果:
AQUEOUS
DEQUEUE
DEQUEUED
DEQUEUES
DEQUEUING
ENQUEUE
ENQUEUED
ENQUEUES
HAWAIIAN
OBSEQUIOUS
PHARMACOPOEIA
QUEUE
QUEUED
QUEUEING
QUEUER
QUEUERS
QUEUES
QUEUING
SEQUOIA (Pharmacopoeia is an official book containing a list of drugs with articles on their preparation and use.)
Print the canonical sort order
This prints the canonical order of characters used by the cmp, lt, gt, le and ge operators:
print +(sort grep /\w/, map { chr } 0 .. 255), "\n";
输出结果:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz The map converts the numbers to their ASCII value; the grep gets rid of special characters and funky control characters that mess up your screen. The plus sign helps Perl interpret the syntax print (...) correctly.
This example shows that, in this case, the expression '_' lt 'a' is TRUE. If your program has the “use locale” pragma, the canonical order will depend on the program’s current locale.
秋月 于 2007-07-12 01:04:12发表:
高深