Sunday, April 19, 2009

perl 内置特殊变量

$- 当前页可打印的行数,属于Perl格式系统的一部分
  $! 根据上下文内容返回错误号或者错误串
  $” 列表分隔符
  $# 打印数字时默认的数字输出格式
  $$ Perl解释器的进程ID
  $% 当前输出通道的当前页号
  $& 与上个格式匹配的字符串
  $( 当前进程的组ID
  $) 当前进程的有效组ID
  $* 设置1表示处理多行格式.现在多以/s和/m修饰符取代之.
  $, 当前输出字段分隔符
  $. 上次阅读的文件的当前输入行号
  $/ 当前输入记录分隔符,默认情况是新行
  $: 字符设置,此后的字符串将被分开,以填充连续的字段.
  $; 在仿真多维数组时使用的分隔符.
  $? 返回上一个外部命令的状态
  $@ Perl解释器从eval语句返回的错误消息
  $[ 数组中第一个元素的索引号
  $ 当前输出记录的分隔符
  $] Perl解释器的子版本号
  $^ 当前通道最上面的页面输出格式名字
  $^A 打印前用于保存格式化数据的变量
  $^D 调试标志的值
  $^E 在非UNIX环境中的操作系统扩展错误信息
  $^F 最大的文件捆述符数值
  $^H 由编译器激活的语法检查状态
  $^I 内置控制编辑器的值
  $^L 发送到输出通道的走纸换页符
  $^M 备用内存池的大小
  $^O 操作系统名
  $^P 指定当前调试值的内部变量
  $^R 正则表达式块的上次求值结果
  $^S 当前解释器状态
  $^T 从新世纪开始算起,脚步本以秒计算的开始运行的时间
  $^W 警告开关的当前值
  $^X Perl二进制可执行代码的名字
  $_ 默认的输入/输出和格式匹配空间
  $| 控制对当前选择的输出文件句柄的缓冲
  $~ 当前报告格式的名字
  $` 在上个格式匹配信息前的字符串
  $’ 在上个格式匹配信息后的字符串
  $+ 与上个正则表达式搜索格式匹配的最后一个括号
  $< 当前执行解释器的用户的真实ID
  $ 含有与上个匹配正则表达式对应括号结果
  $= 当前页面可打印行的数目
  $> 当前进程的有效用户ID
  $0 包含正在执行的脚本的文件名
  $ARGV 从默认的文件句柄中读取时的当前文件名
  %ENV 环境变量列表
  %INC 通过do或require包含的文件列表
  %SIG 信号列表及其处理方式
  @_ 传给子程序的参数列表
  @ARGV 传给脚本的命令行参数列表
  @INC 在导入模块时需要搜索的目录列表
  $-[0]和$+[0] 代表当前匹配的正则表达式在被匹配的字符串中的起始和终止的位置

Monday, March 30, 2009

[perl] 修改 @INC 的几种方法

The @INC Array

@INC is a special Perl variable that is the equivalent to the shell's PATH variable. Whereas PATH contains a list of directories to search for executables, @INC contains a list of directories from which Perl modules and libraries can be loaded.

When you use(), require() or do() a filename or a module, Perl gets a list of directories from the @INC variable and searches them for the file it was requested to load. If the file that you want to load is not located in one of the listed directories, then you have to tell Perl where to find the file. You can either provide a path relative to one of the directories in @INC, or you can provide the full path to the file.

The %INC Hash

%INC is another special Perl variable that is used to cache the names of the files and the modules that were successfully loaded and compiled by use(), require() or do() statements. Before attempting to load a file or a module with use() or require(), Perl checks whether it's already in the %INC hash. If it's there, then the loading and therefore the compilation are not performed at all. Otherwise, the file is loaded into memory and an attempt is made to compile it. do() does unconditional loading -- no lookup in the %INC hash is made.

If the file is successfully loaded and compiled, then a new key-value pair is added to %INC. The key is the name of the file or module as it was passed to the one of the three functions we have just mentioned. If it was found in any of the @INC directories except ".", then the value is the full path to it in the file system.

The following examples will make it easier to understand the logic.

First, let's see what are the contents of @INC on my system:


% perl -e 'print join "\n", @INC'
/usr/lib/perl5/5.00503/i386-linux
/usr/lib/perl5/5.00503
/usr/lib/perl5/site_perl/5.005/i386-linux
/usr/lib/perl5/site_perl/5.005
.

Notice that . (current directory) is the last directory in the list.

Now let's load the module strict.pm and see the contents of %INC:


% perl -e 'use strict; print map {"$_ => $INC{$_}\n"} keys %INC'

strict.pm => /usr/lib/perl5/5.00503/strict.pm

Since strict.pm was found in /usr/lib/perl5/5.00503/ directory and /usr/lib/perl5/5.00503/ is a part of @INC, %INC includes the full path as the value for the key strict.pm.

Now let's create the simplest module in /tmp/test.pm:


test.pm
-------
1;

It does nothing, but returns a true value when loaded. Now let's load it in different ways:


% cd /tmp
% perl -e 'use test; print map {"$_ => $INC{$_}\n"} keys %INC'

test.pm => test.pm

Since the file was found relative to . (the current directory), the relative path is inserted as the value. If we alter @INC by adding /tmp to the end:


% cd /tmp
% perl -e 'BEGIN{push @INC, "/tmp"} use test; \
print map {"$_ => $INC{$_}\n"} keys %INC'

test.pm => test.pm

Here we still get the relative path, since the module was found first relative to ".". The directory /tmp was placed after . in the list. If we execute the same code from a different directory, then the "." directory won't match,


% cd /
% perl -e 'BEGIN{push @INC, "/tmp"} use test; \
print map {"$_ => $INC{$_}\n"} keys %INC'

test.pm => /tmp/test.pm

so we get the full path. We can also prepend the path with unshift(), so it will be used for matching before "." and therefore we will get the full path as well:


% cd /tmp
% perl -e 'BEGIN{unshift @INC, "/tmp"} use test; \
print map {"$_ => $INC{$_}\n"} keys %INC'

test.pm => /tmp/test.pm

The code:


BEGIN{unshift @INC, "/tmp"}

can be replaced with the more elegant:


use lib "/tmp";

Which is almost equivalent to our BEGIN block and is the recommended approach.

These approaches to modifying @INC can be labor intensive, since if you want to move the script around in the file-system, then you have to modify the path. This can be painful, for example, when you move your scripts from development to a production server.

There is a module called FindBin that solves this problem in the plain Perl world, but, unfortunately, it won't work under mod_perl, since it's a module, and as any module, it's loaded only once. So the first script using it will have all the settings correct, but the rest of the scripts will not if they're in a different directory from the first.

For the sake of completeness, I'll present this module anyway.

If you use this module, then you don't need to write a hard-coded path. The following snippet does all the work for you (the file is /tmp/load.pl):


load.pl
-------
#!/usr/bin/perl

use FindBin ();
use lib "$FindBin::Bin";
use test;
print "test.pm => $INC{'test.pm'}\n";

In the above example, $FindBin::Bin is equal to /tmp. If we move the script somewhere else... e.g. /tmp/x in the code above $FindBin::Bin equals /home/x.


% /tmp/load.pl

test.pm => /tmp/test.pm

This is just like use lib except that no hard-coded path is required.

You can use this workaround to make it work under mod_perl.


do 'FindBin.pm';
unshift @INC, "$FindBin::Bin";
require test;
#maybe test::import( ... ) here if need to import stuff

This has a slight overhead, because it will load from disk and recompile the FindBin module on each request. So it may not be worth it.



  1. -l /Users/xx/perl_lib
  2. BEGIN {
    push @INC,"/Users/xx/perl_lib";}
  3. export PERL5LIB=/Users/xx/perl_lib
  4. use lib "/Users/xx/perl_lib";

Thursday, March 19, 2009

gliffy 一个online 的流程图制作网站

http://www.gliffy.com/
一个提供在线制作流程图服务的网站,可在线绘制框图、流程图、网络拓补图等等,类似微软的visio,或者Mac上的OmniGraffle。在网上就可以绘制和保存,需要的时候也可以导出成SVG、JPEG或者PNG的格式,非常方便。Gliffy支持协同编辑,你可以通过email邀请你的好友共同编辑你的流程图

Wednesday, March 18, 2009

一个写文档的工具 sphinx

python 写的文档工具

官方网站:
http://sphinx.pocoo.org

tutorial:
http://scienceoss.com/use-sphinx-for-documentation/

Tuesday, March 17, 2009

google 大牛 Jeffery Dean 的在 WSDM 的报告

Google 大牛 Jeffery Dean ( http://research.google.com/people/jeff/) ,是map reduce 框架的发明者,
他在WSDM 09 的报告
Challenges in Building Large-Scale Information Retrieval Systems
透漏了一些google 内部实现细节,现在已经有了报告的 keynote,
英文版pdf 见 http://research.google.com/people/jeff/WSDM09-keynote.pdf
中文翻译见 http://docs.google.com/Present?docid=afdfdfhqkrd8_1098qht7ggj
另外还有 (http://glinden.blogspot.com/2009/02/jeff-dean-keynote-at-wsdm-2009.html) Greg的评论。

wsdm 09 的演讲视频 http://videolectures.net/wsdm09_barcelona/

有空可以看看