红联Linux门户
Linux帮助

关于对dtree的详细分析

发布时间:2009-12-02 14:29:47来源:红联作者:xiaopan3322
:0)1
要查看原文,请进我的博客:http://blog.sina.com.cn/baoxiaopan
这些天经过和阿城同学的激烈讨论,终于对下面的脚本有了大致明确的理解,虽然是简简单单的两行,但是涉及面还是很广的,学习就要有不止的学习欲望和不息的学习热情,我会把我的一些理解阐述如下,可能有些地方表述的不是很清楚,但是我还是认为我已经表达了需要表达的意思,如果发现有错误或者有补充的,欢迎 email告诉我,不甚感激。当然如果有不明确的,也欢迎随时email或者QQ联系我:xiaopan3322@gmail.com,157526632。
Description
dtree is a utility that will display a directory hierarchy or tree.
While Linux comes with hundreds of utilities, something you got used to on another system always seems to be missing. One program in this category is something that will display a directory hierarchy or tree.
While some file managers that run under X-Windows will do this sort of task, it is sometimes very handy to have a command-line version. While not Linux-specific, the dtree utility is such a program.
I will first explain how to use dtree, then explain how it works. If you invoke it by just entering its name it will display the directory hierarchy starting at the current directory. If you invoke it with an argument, that argument is used as the starting directory. For example, if you enter dtree /home/fyl/Cool, a tree of directories under /home/fyl/Cool will be displayed.
dtree is written in the finest old-time Unix tradition using common utilities with a short shell script to glue them together. Here is the program:

脚本代码:[code]#!/bin/bash
# print a hierarchy tree starting at
# specified directory (. default)
(cd ${1-.}; pwd)
find ${1-.} -type d -print | sort -f | sed -e "s,^${1-.},," -e "/^$/d" -e 's,[^/]*/\([^/]*\)$,`-----\1,' -e"s,[^/]*/,| ,g"[/code]
脚本分析:
1,第一句话(cd ${1-.}; pwd),是为了放在一个sub-shell中执行两句脚本,这样的好处很明显,不会跑到别的路径中去,如果是非root用户,就避免了一些不必要的权限问题,而且sub-shell中很有用的一个好处是,执行结果和环境变量不会返回给父进程,这样就保证了独立性,不会影响到父进程。因此这句话的意图也就很明显了:为了显示你要查找的目录,所以才使用了sub-shell。
2,第一句中的${1-.},其实是一种选择,此条命令其实是有参数的,即用户需要查看的路径名,如果用户输入了路径,那么程序就会选择$1,如果用户没有输入路径参数,那么程序会自动引用当前目录,即.目录。
3,find ${1-.} -type d -print | sort -f,这句话很简单,就是为了查找用户输入目录(或是当前目录)下的所有目录,并不care大小写从a-z排序。
4,第二个管道的分析:
4.1,第一个-e是将输入的参数目录(或者当前目录)替换为空行,以I为例,执行到第一个-e为止的结果为:
tdlteman@hzling06:~$ sh dtree.sh bak_config
/home/tdlteman/bak_config

/bak_script
/bak_script/dos2unix-3.1
/bak_script/dos2unix-3.1/dos2unix-3.1
/bak_script/L2_xp
/bak_script/test
/configFiles
可见,第二行是一个空行
4.2,第二个-e是将空行删除,目的是为了删除之前形成的那个空行,以I为例,执行到第二个-e后的结果为:
/home/tdlteman/bak_config
/bak_script
/bak_script/dos2unix-3.1
/bak_script/dos2unix-3.1/dos2unix-3.1
/bak_script/L2_xp
/bak_script/test
/configFiles
可见,第二行的空行已经删除
4.3,理解第三个-e的关键是$和\(..\) 的用法,在sed中,$的作用是要锚定行的结束如:/sed$/匹配所有以sed结尾的行;而\(..\)的作用是要保存匹配的字符,如s/\(love \)able/\1rs,loveable被替换成lovers。因此[^/]*/\([^/]*\)$的意思是:锚定只要不是以/结尾的行,具体点说就是在最后一个字符前一定要出现一个/,至于是不是以/开头的无关紧要,这句话的目的,其实是为了找出后面标记为1的字串。以I为例,第一个找到的应该是 /bak_script这一行,并且在符合这样的 pattern的行中继续查找不以/开头并且以任意个字符结尾的字串,并且保存符合这样的pattern的字串并标志为1,以备之后的替换用,在此例中,第一个匹配并标志为1的字串为bak_script,接着就会以`-----bak_script去替换bak_script,以此类推,由于这里的替换没有/g参数,因此每行只操作一次,并没有对整行进行操作。以I为例,执行到第三个-e为止的结果为:
/home/tdlteman/bak_config
`-----bak_script
/`-----dos2unix-3.1
/bak_script/`-----dos2unix-3.1
/`-----L2_xp
/`-----test
`-----configFiles
可见,hiberarchy结构已经基本形成。
有兴趣的朋友可以试一试去掉第一个[^/]*/的情况,即变为"s,\([^/]*\)$,\`-----\1,"的情况,这里可以贴出我的测试结果:
/home/tdlteman/bak_config
/`-----bak_script
/bak_script/`-----dos2unix-3.1
/bak_script/dos2unix-3.1/`-----dos2unix-3.1
/bak_script/`-----L2_xp
/bak_script/`-----test
/`-----configFiles
因此最终结果就成了
/home/tdlteman/bak_config
| `-----bak_script
| | `-----dos2unix-3.1
| | | `-----dos2unix-3.1
| | `-----L2_xp
| | `-----test
| `-----configFiles
现象很明显,多了一个第二句开始每句都多了一个/,最终结果也就多了最外面的一层“| ”。我们可以简单分析下,如果去掉了[^/]*/这句,那么关于/的匹配就没有了,只能等到第四个-e去匹配了,因此可以想象,执行完去掉[^/]*/后的结果总会被不去掉的结果多一个/,因此也就多了一次“| ”的替换。因此这句脚本的目的是为了保证每次要替换的行中,比原来的行多去掉一个/(包括/之前的字符)。
4.4,最后一个-e,和第三个-e类似,是为了把不是以/开头但是要以/结尾的字串替换为| ,在这里其实就是指以/结尾的字串,因为即使是开头的/也会被替换(可看做是一种特殊情况),因此执行完所有的-e操作后,就会形成最终的结果:
/home/tdlteman/bak_config
`-----bak_script
| `-----dos2unix-3.1
| | `-----dos2unix-3.1
| `-----L2_xp
| `-----test
`-----configFiles

可见,hiberarchy结构已经成型,非常的有层次感。
5,对于's,[^/]*/\([^/]*\)$,`-----\1,'这句话,其实硬引用''也可以修改为软引用"",如果用了软引用,那么`的写法就需要加上转义字符\,此句话就变为"s,[^/]*/\([^/]*\)$,`-----\1,"
在这里值得注意的是,执行"sh dtree.sh bak_config"和"sh dtree.sh bak_config/"的结果是不一样的,有着细微的差别,原因很明显,因为字串匹配的条件变了,这里就不做具体的分析,有兴趣的可以自己分析。其实过程完全一样。这里只附上执行结果。


示例:
I:
#####执行sh dtree.sh bak_config后的结果:
###只执行第一个管道(没有执行sed一句)的执行结果:
tdlteman@hzling06:~$ sh dtree.sh bak_config
/home/tdlteman/bak_config
bak_config
bak_config/bak_script
bak_config/bak_script/dos2unix-3.1
bak_config/bak_script/dos2unix-3.1/dos2unix-3.1
bak_config/bak_script/L2_xp
bak_config/bak_script/test
bak_config/configFiles
###整段脚本的运行结果:
tdlteman@hzling06:~$ sh dtree.sh bak_config
/home/tdlteman/bak_config
`-----bak_script
| `-----dos2unix-3.1
| | `-----dos2unix-3.1
| `-----L2_xp
| `-----test
`-----configFiles

II:
#####执行sh dtree.sh bak_config/的结果:
###只执行第一个管道(没有执行sed一句)的执行结果:
tdlteman@hzling06:~$ sh dtree.sh bak_config/
/home/tdlteman/bak_config
bak_config/
bak_config/bak_script
bak_config/bak_script/dos2unix-3.1
bak_config/bak_script/dos2unix-3.1/dos2unix-3.1
bak_config/bak_script/L2_xp
bak_config/bak_script/test
bak_config/configFiles
###整段脚本的运行结果:
tdlteman@hzling06:~$ sh dtree.sh bak_config/
/home/tdlteman/bak_config
bak_script
`-----dos2unix-3.1
| `-----dos2unix-3.1
`-----L2_xp
`-----test
configFiles


最后给出作者的解释,有兴趣的可以参考下:
The first line in the output is the name of the directory dtree was run on. This line was produced by the line that begins with (cd. Breaking this line down:
*
${1-.} means use the first argument from the command line ($1) if it is available, otherwise use . which is a synonym for the current directory. Thus, the cd command either changes to the directory specified on the line that invoked dtree or to the current directory (a virtual no-op).
*
pwd then displays the path name of the current directory.
*
The parentheses around the whole line force the command to be run in a subshell. This means the cd command is local to this line and subsequent commands will be executed from what was the current directory when dtree was initially invoked.
*
The find command prints out all files whose type is d (for directory). The same directory reference is used as in cd.
*
The output of find is piped into find and the -f option tells sort to fold upper and lower case names together.
*
The tricky formatting of the tree is done by sed in four steps. Each step is set off by -e. This is how you tell sed a program follows.
*
The first expression_r_r_r_r_r, s,^${1-.},," is a substitute command which tells sed to replace everything between the first two delimiters (a comma is used as the delimiter) with everything between the second. The initial ^ causes the match to be performed only at the beginning of the line. The expression_r_r_r_r_r that follows is, again, the starting directory reference, and the string between the second pair of delimiters is null. Thus, the requested directory name from the beginning of the output of sort is trimmed.
*
The second expression_r_r_r_r_r, /^$/d tells sed to delete all blank lines (lines with nothing between the beginning and the end).
*
The third expression_r_r_r_r_r is probably the trickiest. It used the ability to remember a string within a regular expression_r_r_r_r_r and then use it later. The expression_r_r_r_r_r s,[^/]*/\([^/]*\)$,\`-----\1, tells sed to replace the last two strings separated by a slash (/) with a backquote, five dashes and the last string (following the final slash).
*
Lastly, the final expression_r_r_r_r_r, -e "s,[^/]*/,| ,g" tells sed to replace every occurrence of strings that do not contain a slash but are followed by a slash, with a pipe (|) and six spaces.

Unless you are familiar with regular expression_r_r_r_r_rs you probably didn't follow all that. But you probably learned something and you can easily use dtree without having to understand how it works.
差不多就这些了,脚本是死的,大家可以对这个脚本按照自己的意图进行修改,你会发现很多好玩的东西,哪怕只是改变了其中的一个字符,结果也会有所不同,这就是shell脚本的魅力所在。最后还是要感谢阿城同学。


最后,附上一本珍贵的经典书籍
----Advanced Bash-Scripting Guide_6.1.pdf
文章评论

共有 1 条评论

  1. lrfz008 于 2009-12-02 15:16:20发表:

    支持楼主