Tuesday, June 8, 2010

公司难得的学术交流机会

继上次丘成桐教授的演讲之后,本周四下午来自瑞士联邦理工学院洛桑分校(EPFL)和香港科技大学(HKUST)的几位专家又将为阿里巴巴的同学们带来3场精彩演讲。

演讲主题(全部是中文演讲):
1. Impact of Recommenders on Consumer Behaviors (Pearl Pu浦还珠教授,瑞士联邦理工学院洛桑分校,Human Computer Interaction Group,
Faculty of Information and Communication Sciences)
2. Exploiting the Collective Wisdom in Recommendation Systems (Nathan Nan Liu,香港科技大学)
3. Transfer Learning with Applications (Qiang Yang杨强教授, 香港科技大学, IEEE Fellow);

附:
1.演讲嘉宾及主题介绍
(主题简介)Impact of Recommenders on Consumer Behaviors (Pearl Pu浦还珠教授,瑞士联邦理工学院洛桑分校,Human Computer Interaction Group,Faculty of Information and Communication Sciences)
演讲嘉宾简介
Dr. Pearl Pu (浦还珠) is the director of the Human Computer Interaction Group at the Swiss Federal Institute of Technology in Lausanne (EPFL) where she teaches and conducts research in HCI and consumer decision behaviors. She has been recently elected as the general chair for the ACM international conference on Intelligent User Interfaces (IUI 2011) and ACM international conference on Recommender Systems (Recsys 2008), and program co-chair of the ACM international conference in Electronic Commerce (EC 2009) and Adaptive Hypermedia and Adaptive Web-Based Systems (AH 2008). A native from Shanghai, she moved to the United States shortly after passing the entrance examination to the ZheJiang University. She obtained her Master and Ph.D. degrees from the University of Pennsylvania in artificial intelligence and computer graphics. She was a visiting scholar at Stanford University in 2001 and more recently at HKUST in 2010. She has consulted many online companies on recommender system design, mobile interfaces, and personalized product search.
主题简介:
As online stores offer practically an infinite shelf space, recommender systems are playing an increasingly important role in helping users search and discover items that they may want to buy. In contrast to the proliferation of personalized Web services used in online industries and the widespread publication on the algorithmic success of recommenders, little is known about the effects of recommenders on consumer decision behaviors, for example, “which items should be recommended to influence consumers’ basket construction?” In this talk, I present some results based on empirical work that we have conducted in understanding and evaluating recommender systems impact on consumer behaviors.
---
(主题二)Exploiting the Collective Wisdom in Recommendation Systems (Nathan Nan Liu,香港科技大学)
演讲嘉宾简介:
Liu Nan is a PHD candidate in the department of computer science and engineering at HKUST working with Qiang Yang. He current works focus on machine learning and data mining techniques with applications to recommendation systems. He has published papers at SIGIR, CIKM and WWW and has served as PC members in EMNLP'09, AAAI'10 and KDD'10 and a guest editor for IEEE Intelligent Systems Special Issue on Social Learning.
主题简介:
Collaborative filtering is a powerful technology for making recommendations based on the behaviors of massive number of users. It is well known that some of most successful internet services such as Amazon and Digg heavily rely on collaborative filtering to make recommendations to their users. In this talk, we will present an overview of several new research directions in the area of collaborative filtering research that is being pursued at HKUST. Firstly, we present ranking based models for collaborative filtering for directly optimizing the quality of top-k recommendation list. Secondly, we would discuss how to exploit implicit user feedback such as clicks, purchases in addition to the more traditional explicit user feedback normally in the form of ratings. Thirdly, we present distributed algorithms for scaling matrix factorization models to massive datasets. Finally, we show how to more accurately combine user behaviors observed multiple highly diverse domains via transfer learning.
--
(主题三)Transfer Learning with Applications (Qiang Yang杨强教授, 香港科技大学, IEEE Fellow)
演讲嘉宾简介:
Qiang Yang is a professor in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology and an IEEE Fellow. His research interests are artificial intelligence, including automated planning, machine learning and data mining. He graduated from Peking University in 1982 with BSc. in Astrophysics, and obtained MSc. degrees in Computer Science and in Astrophysics from the University of Maryland, College Park in 1985 and 1987, respectively, as well as his PhD in Computer Science from the University of Maryland, College Park in 1989.  He was an assistant/associate professor at the University of Waterloo between 1989 and 1995, and a professor and NSERC Industrial Research Chair at Simon Fraser University in Canada from 1995 to 2001.  He is a fellow of IEEE, and a member of AAAI, AAAS and ACM. He is an author of two books and over 200 publications on AI and data mining. His research teams won the 2004 and 2005 ACM KDDCUP international competitions on data mining.  He is an invited speaker at IJCAI 2009, ACL 2009 and ACML 2009.
Qiang Yang is on the editorial boards of several international journals.  He is the founding Editor in Chief of the ACM Transactions on Intelligent Systems and Technology (ACM TIST).  He is on the editorial board of IEEE Intelligent Systems and Journal of Web Intelligence.  Previously he has been an associate editor for IEEE Transactions on Knowledge and Data Engineering, and Journal of Knowledge and Information Systems. He has been an organizer for several international conferences in AI and data mining, including the PC co-chair for ACM KDD 2010, the conference co-chair for ACM IUI 2010, Tutorial co-chair for AAAI 2005/2006, Workshop chair for ACM KDD 2007, program co-chair for PRICAI 2006 and PAKDD 2007, data mining contest chair for IEEE ICDM 2007/2009, vice chair for ICDM 2006 and CIKM 2009, conference chair for ICCBR 2001 and PC co-chair for Canadian AI conference in 2000. His home page is at http://www.cse.ust.hk/~qyang
主题简介:
Transfer learning is a new machine learning and data mining framework that allows the training and future data to come from different distributions or feature spaces. We can find many novel applications of machine learning and data mining where transfer learning is necessary. In this talk, I will give an introduction to transfer learning and then highlight some important applications such as text and image classification, sensor data mining and activity recognition, collaborative filtering and bioinformatics.  I will also discuss some potential future directions of transfer learning.

发现了一个比较牛的淘宝书店

http://weipipi.taobao.com

这里面好多外文书,不知是不是复印的。
数学、物理的很全,准备买本试试

Sunday, April 19, 2009

perl 内置特殊变量

$- 当前页可打印的行数,属于Perl格式系统的一部分
  $! 根据上下文内容返回错误号或者错误串
  $” 列表分隔符
  $# 打印数字时默认的数字输出格式
  $$ Perl解释器的进程ID
  $% 当前输出通道的当前页号
  $& 与上个格式匹配的字符串
  $( 当前进程的组ID
  $) 当前进程的有效组ID
  $* 设置1表示处理多行格式.现在多以/s和/m修饰符取代之.
  $, 当前输出字段分隔符
  $. 上次阅读的文件的当前输入行号
  $/ 当前输入记录分隔符,默认情况是新行
  $: 字符设置,此后的字符串将被分开,以填充连续的字段.
  $; 在仿真多维数组时使用的分隔符.
  $? 返回上一个外部命令的状态
  $@ Perl解释器从eval语句返回的错误消息
  $[ 数组中第一个元素的索引号
  $ 当前输出记录的分隔符
  $] Perl解释器的子版本号
  $^ 当前通道最上面的页面输出格式名字
  $^A 打印前用于保存格式化数据的变量
  $^D 调试标志的值
  $^E 在非UNIX环境中的操作系统扩展错误信息
  $^F 最大的文件捆述符数值
  $^H 由编译器激活的语法检查状态
  $^I 内置控制编辑器的值
  $^L 发送到输出通道的走纸换页符
  $^M 备用内存池的大小
  $^O 操作系统名
  $^P 指定当前调试值的内部变量
  $^R 正则表达式块的上次求值结果
  $^S 当前解释器状态
  $^T 从新世纪开始算起,脚步本以秒计算的开始运行的时间
  $^W 警告开关的当前值
  $^X Perl二进制可执行代码的名字
  $_ 默认的输入/输出和格式匹配空间
  $| 控制对当前选择的输出文件句柄的缓冲
  $~ 当前报告格式的名字
  $` 在上个格式匹配信息前的字符串
  $’ 在上个格式匹配信息后的字符串
  $+ 与上个正则表达式搜索格式匹配的最后一个括号
  $< 当前执行解释器的用户的真实ID
  $ 含有与上个匹配正则表达式对应括号结果
  $= 当前页面可打印行的数目
  $> 当前进程的有效用户ID
  $0 包含正在执行的脚本的文件名
  $ARGV 从默认的文件句柄中读取时的当前文件名
  %ENV 环境变量列表
  %INC 通过do或require包含的文件列表
  %SIG 信号列表及其处理方式
  @_ 传给子程序的参数列表
  @ARGV 传给脚本的命令行参数列表
  @INC 在导入模块时需要搜索的目录列表
  $-[0]和$+[0] 代表当前匹配的正则表达式在被匹配的字符串中的起始和终止的位置

Monday, March 30, 2009

[perl] 修改 @INC 的几种方法

The @INC Array

@INC is a special Perl variable that is the equivalent to the shell's PATH variable. Whereas PATH contains a list of directories to search for executables, @INC contains a list of directories from which Perl modules and libraries can be loaded.

When you use(), require() or do() a filename or a module, Perl gets a list of directories from the @INC variable and searches them for the file it was requested to load. If the file that you want to load is not located in one of the listed directories, then you have to tell Perl where to find the file. You can either provide a path relative to one of the directories in @INC, or you can provide the full path to the file.

The %INC Hash

%INC is another special Perl variable that is used to cache the names of the files and the modules that were successfully loaded and compiled by use(), require() or do() statements. Before attempting to load a file or a module with use() or require(), Perl checks whether it's already in the %INC hash. If it's there, then the loading and therefore the compilation are not performed at all. Otherwise, the file is loaded into memory and an attempt is made to compile it. do() does unconditional loading -- no lookup in the %INC hash is made.

If the file is successfully loaded and compiled, then a new key-value pair is added to %INC. The key is the name of the file or module as it was passed to the one of the three functions we have just mentioned. If it was found in any of the @INC directories except ".", then the value is the full path to it in the file system.

The following examples will make it easier to understand the logic.

First, let's see what are the contents of @INC on my system:


% perl -e 'print join "\n", @INC'
/usr/lib/perl5/5.00503/i386-linux
/usr/lib/perl5/5.00503
/usr/lib/perl5/site_perl/5.005/i386-linux
/usr/lib/perl5/site_perl/5.005
.

Notice that . (current directory) is the last directory in the list.

Now let's load the module strict.pm and see the contents of %INC:


% perl -e 'use strict; print map {"$_ => $INC{$_}\n"} keys %INC'

strict.pm => /usr/lib/perl5/5.00503/strict.pm

Since strict.pm was found in /usr/lib/perl5/5.00503/ directory and /usr/lib/perl5/5.00503/ is a part of @INC, %INC includes the full path as the value for the key strict.pm.

Now let's create the simplest module in /tmp/test.pm:


test.pm
-------
1;

It does nothing, but returns a true value when loaded. Now let's load it in different ways:


% cd /tmp
% perl -e 'use test; print map {"$_ => $INC{$_}\n"} keys %INC'

test.pm => test.pm

Since the file was found relative to . (the current directory), the relative path is inserted as the value. If we alter @INC by adding /tmp to the end:


% cd /tmp
% perl -e 'BEGIN{push @INC, "/tmp"} use test; \
print map {"$_ => $INC{$_}\n"} keys %INC'

test.pm => test.pm

Here we still get the relative path, since the module was found first relative to ".". The directory /tmp was placed after . in the list. If we execute the same code from a different directory, then the "." directory won't match,


% cd /
% perl -e 'BEGIN{push @INC, "/tmp"} use test; \
print map {"$_ => $INC{$_}\n"} keys %INC'

test.pm => /tmp/test.pm

so we get the full path. We can also prepend the path with unshift(), so it will be used for matching before "." and therefore we will get the full path as well:


% cd /tmp
% perl -e 'BEGIN{unshift @INC, "/tmp"} use test; \
print map {"$_ => $INC{$_}\n"} keys %INC'

test.pm => /tmp/test.pm

The code:


BEGIN{unshift @INC, "/tmp"}

can be replaced with the more elegant:


use lib "/tmp";

Which is almost equivalent to our BEGIN block and is the recommended approach.

These approaches to modifying @INC can be labor intensive, since if you want to move the script around in the file-system, then you have to modify the path. This can be painful, for example, when you move your scripts from development to a production server.

There is a module called FindBin that solves this problem in the plain Perl world, but, unfortunately, it won't work under mod_perl, since it's a module, and as any module, it's loaded only once. So the first script using it will have all the settings correct, but the rest of the scripts will not if they're in a different directory from the first.

For the sake of completeness, I'll present this module anyway.

If you use this module, then you don't need to write a hard-coded path. The following snippet does all the work for you (the file is /tmp/load.pl):


load.pl
-------
#!/usr/bin/perl

use FindBin ();
use lib "$FindBin::Bin";
use test;
print "test.pm => $INC{'test.pm'}\n";

In the above example, $FindBin::Bin is equal to /tmp. If we move the script somewhere else... e.g. /tmp/x in the code above $FindBin::Bin equals /home/x.


% /tmp/load.pl

test.pm => /tmp/test.pm

This is just like use lib except that no hard-coded path is required.

You can use this workaround to make it work under mod_perl.


do 'FindBin.pm';
unshift @INC, "$FindBin::Bin";
require test;
#maybe test::import( ... ) here if need to import stuff

This has a slight overhead, because it will load from disk and recompile the FindBin module on each request. So it may not be worth it.



  1. -l /Users/xx/perl_lib
  2. BEGIN {
    push @INC,"/Users/xx/perl_lib";}
  3. export PERL5LIB=/Users/xx/perl_lib
  4. use lib "/Users/xx/perl_lib";

Thursday, March 19, 2009

gliffy 一个online 的流程图制作网站

http://www.gliffy.com/
一个提供在线制作流程图服务的网站,可在线绘制框图、流程图、网络拓补图等等,类似微软的visio,或者Mac上的OmniGraffle。在网上就可以绘制和保存,需要的时候也可以导出成SVG、JPEG或者PNG的格式,非常方便。Gliffy支持协同编辑,你可以通过email邀请你的好友共同编辑你的流程图

Wednesday, March 18, 2009

一个写文档的工具 sphinx

python 写的文档工具

官方网站:
http://sphinx.pocoo.org

tutorial:
http://scienceoss.com/use-sphinx-for-documentation/

Tuesday, March 17, 2009

google 大牛 Jeffery Dean 的在 WSDM 的报告

Google 大牛 Jeffery Dean ( http://research.google.com/people/jeff/) ,是map reduce 框架的发明者,
他在WSDM 09 的报告
Challenges in Building Large-Scale Information Retrieval Systems
透漏了一些google 内部实现细节,现在已经有了报告的 keynote,
英文版pdf 见 http://research.google.com/people/jeff/WSDM09-keynote.pdf
中文翻译见 http://docs.google.com/Present?docid=afdfdfhqkrd8_1098qht7ggj
另外还有 (http://glinden.blogspot.com/2009/02/jeff-dean-keynote-at-wsdm-2009.html) Greg的评论。

wsdm 09 的演讲视频 http://videolectures.net/wsdm09_barcelona/

有空可以看看