TCS-DB-er: 2009

Wednesday, December 2, 2009

the power of operators

I want to discuss a little bit about a very useful tool in analysis, the operators.

The theory of operators is an entire area of which I only know some basics.
However, even basic knowledge about it can simplify things quite a bit and sometimes makes impossible things possible. I will just discuss one nice cute application.

Suppose we want to compute


\int xe^x dx

This is very simple: we just do integration by part,


\int xe^x dx = \int x de^x = xe^x -\int e^x dx = xe^x -e^x +C

However, what if we want to compute


\int P(x)e^x dx

where P(x) is a polynomial of degree n?

After a moment reflection, you might want to say that you can repeat integration by part n times and each time the deg of the polynomial inside the integral is reduced by one. Yes, but if you want to do this by hand, this will cost you a big piece of paper and xxx minutes before you find out how the terms evolve in each iteration and conclude the result by induction.

However, with the help of operators, it is just a piece of cake:

Let us us D be the differential operator, i.e. Df = f'.

Then 1/D is the integral operator.

It is a known fact that for any integer k,


D^k (e^{ax}y)=e^{ax}(D+a)^k y

Therefore,


\int P(x)e^{ax} dx = \frac{1}{D} e^{ax} P(x)= e^{ax} \frac{1}{D+a} P(x)

= \frac{e^{ax}}{a} \frac{1}{1+D/a} P(x) = \frac{e^{ax}}{a} (1-D/a +D^2/a^2 - D^3/a^3 \ldots)P(x)

= \frac{e^{ax}}{a} (P(x)-P'(x)/a +P''/a^2 - P^{(3)}/a^3 \ldots)

Notice that we only need the terms up to D^n ( since taking n+1 times differentiation of a poly of degree n

will result in 0).

Now, we can see every term is very simple to compute.

Actually, this integral appears quite often in various contexts, for eg. Laplace transform, Fourier analysis, moment generating function, where we often need to convolute functions with exponentials..

Write latex in blogger

It is very simple. Please check this out for the details.

http://www.botcyb.org/2008/10/rendering-latex-in-blogger.html

Here is how it looks..

\int_{0}^{1}\frac{x^{4}\left(1-x\right)^{4}}{1+x^{2}}dx =\frac{22}{7}-\pi

Your browser should allow javascript in order to show the rendered formula correctly.
Unfortunately, it doesn't seem to work in google reader....:(

Wednesday, October 14, 2009

A funny bio

Mani Soma, A prof at UW, IEEE fellow

http://faculty.washington.edu/manisoma/

Read his Biosketch...:)

Monday, October 5, 2009

Jeff Erickson's list on "what a theorectical computer science student should know"

Things happened Last week included Mihai Pătraşcu's (yes, it was him again!) this contraversial blog post with 74+ comparably interesting comments! I highly recommend you to follow it a bit if you haven't done so.

I would like to quote today a comment by Jeff Erickson on related post.
It is a (a suprisingly long and diverse) list on "what he thinks a phd in theorectical computer science should know". Erickson is a computational geometrist. That is why his list contains a few of geometry results (or algebraic results that is useful in algebraic geometry?) that I never heard of, like persistent homology, de Casteljau's algorithm...But other than that , i think it is quite fair list. Check how many you haven't heard of and wiki them! :)

here is Erickson's (I think it is him) comment on this post "core TCS" (by Michael Mitzenmacher):
"I'm sure my answer will be horribly biased, but at the very least, it should touch on Voronoi diagrams, Van Emde Boas trees, the dynamic optimality conjecture, Vapnik-Chernovenkis dimension, the Immerman–Szelepcsényi theorem, the Lovasz Local Lemma, spectral partitioning, Bloom filters, boosting, Arrow’s paradox, Nash equilibria, Azuma's inequality, zero-knowledge proofs, the PCP theorem, smoothed analysis, coresets, Reed-Solomon codes, distributed cache consistency protocols, interior-point methods, persistent homology, AKS sorting networks, monadic second-order logic, snake sort, evasive graph properties, the Okamura-Seymour theorem, natural proofs, Ramsey theory, fractional cascading, communication complexity, doubling dimension, derandomization, soft heaps, the power of two choices, the Graph Minor Theorem, Krylov subspace iteration, Euclid's algorithm, Dijkstra's algorithm, Boruvka's algorithm, Dehn's algorithm, Grover's algorithm, Buchberger's algorithm, de Casteljau's algorithm, Khachiyan's algorithm, Bresenham's algorithm, Dinits's algorithm, Lloyd's algorithm, and how to quickly look up lots of impressive-sounding results on Wikipedia."

I would like to add a few things I really like and think are equally important/elegent:
Dynamic programming speedup and SMAWK algorithm, expander, Karp-Luby and Karp-Luby-Madras , Alon-Matias-Szegedy algorithm, regularity lemma, perfect graph theorems, treewidth, Baker's shifting technique, Digital fountain, Lovasz's Sandwich theorem... I can only recall this many at this point and probably will update it later.
Also I would like to see a similar list for DB students. Hellerstein and Stonebraker 's list might be a good starting point.

Monday, September 28, 2009

TAOCP and the inversion table of a permutation.

Perhaps every body in Computer science knows Knuth's volumes of "the art of computer programming".
Since I was not a in a cs undergraduate program, I knew this book even before
any introductory algorithm book like "Introduction to Algorithms".
Partly due to the comment made by Bill Gate on the back of the 1st volumes: "If you think you're a really good programmer . . . read (Knuth's) Art of Computer Programming . . . You should definitely send me a resume if you can read the whole thing.", I started reading the book, as my very first book on algorithms (I had read some books about c and vb programming before, but I wouldn't count them as algorithm books).
It was like 5 years back, I could only read 2-3 pages per day. Eventually I finished two volumes (the 1st and 3rd vol). I still remember many exciting moments when I managed to understand the contents, full of numerous amazing algorithms and beautiful analysis. The book does not only convey the knowledge of which the big vols already contain a incredibly amount,
but also an aesthetic point of view towards algorithms by providing a large number of masterpieces that best illustrate it. I probably should say, the topics discussed in the book is arguably bit old-fashined and these volumes do not appear in papers' ref as frequently in nowadays cs research as 3o years ago. However, many tricks covered in the book can still be very useful.

Today, I will discuss a very cute one, the inversion table for a permutation. The definition is very simple.
Given a permutation P of {1,2,....,n}, its inversion table is a length n sequence
b1,b2,...,bn where bi= #(elements that are bigger than i and appear to the left of i in P.

e.g. P= {591826473}
its inverse table is {236402210}. b1=2 since there are two elements (5 and 9) larger than and on the LHS of 1.

A easy observation is an inversion table uniquely determines a permutation.
It is at times much more convenient to work with inversion tables than original permutations since the space of inversion tables looks much simpler:
actually, every sequence b1b2....bn with 0<=bi<=n-i corresponds to some inversion table!
(Note we also have n*(n-1)*....*1=n! many of them.)

Let us see an application.
Compute the expected number of right-to-left maxima for a random permutation of {1,2,...,n}.
This was one problem in the final exam of our probabilistic method course taught by Aravind Srinivasan.
Direct computation would be rather cumbersome. I remember I spent like 20min on manipulating binomial coefficients and got nowhere...
Let us work with the inversion table b1b2...bn now.
We can see the number i is a right-to-left maxima if and only if bi=n-i (all n-i larger numbers are all to its left).
Also taking a random permutation is equivalent to taking a random inversion table since the 1-1 correspondence.
Therefore,
E= \sum_{i=1}^n Pr(i is a right-left maximum)=\sum_{i=1}^n Pr(bi=n-i)= \sum_{i=1}^n 1/(n-i+1) = H_n (the n-th harmonic number).

Cute, isn't it!

Another solution I know is to use the following recurrence:

E[#maxima w.r.t {1...n}] = \sum_{i=1}^n Prob(first element is i) (E[#maxima w.r.t {i+1...n}]+1)
which reduces to the recursion
E(n)= \sum_{i=0}^{n-1}(E(n-i)+1)/n.
The nth harmonich number H_n solves this recursion.
This is as simple, but way less cute, I think.

Sunday, August 30, 2009

VLDB09 - journal like review process.

This year's vldb was held in Lyon, France. I came back from there yesterday night. The jet-lag woke me at 5pm...so I have some time to do some notes here.

The first thing i want to note is the undergoing change of the review process of Proc. of VLDB. This year' s vldb includes a couple of (<=10) papers that have gone through a journal like review process(multiple rounds), rather than the traditional one round thing for conference. Next year's proceeding will include 10% of such papers. From 2011, there will be no traditional conference PC and everything will be shifted to the journal-like track.

I talked to quite a few of my friends. Most of them did not seem quite enthusiastic about the transition. I guess the reason might be that they are able to get papers accepted anyway and the multi-round process only implies an increasesing workload but not a higher chance to them. Literally quote (translated into english) some of their words "I bet the review will be 'more experiments on xxx', 'more references on xxx' , ' more xxx on xxx' "; "more experiments, on a new dataset, these are only easy to say"," the worst thing could happen is that a reviewer didn't like the paper intially but asked for a major revision rather than gave a outright reject. Then the paper was turned around multiple times, the content was rewritten and exp was redone and the reviewer still didn't like the paper and finally rejected it"......I have a few journal papers and went through the multiround process myself twice. Fortunately, my experience was not that bad. The first paper was submitted to transaction on information theory. we got two quite positive reviews. One review was like 5 pages long with numerous detailed comments, but each of those was fairly easy to fix and the paper got accepted after that round. The other one was a very short paper which i submitted to Information Processing Letter. The process was even smoother. I got two reviews and one of them only had one sentence for recommending acceptence. Responsing the other review took me less than two hours and the paper was accetped after this round. I hope this new vldb journal thing can improve the paper quality substantially without bringing us too much pain....

Saturday, August 22, 2009

Some random notes from MPI

Today is the last day of my stay in Saarbrucken, Germany. During the last two weeks, I have been here visiting Julian Mestre, a former student of my current advisor Samir Khuller.

We have been working on a stochastic optimization problem and already had some exciting progress. Cooperating with Julian is a lot fun. We have a big overlap of our background, which makes the catchup very easy, meanwhile our thinking patterns are quite different and complements to each other well. (As I observed) Julian tends to start off from concrete examples (worst/average/best case examples), making insightful observation and then generalizing until the discover of the solution which works for all case. The way I think usually begins with some very vague framework and then filling in details and testing whether it works for concrete examples. if it didn't work, then try to patch it if possible or just scratch it and start from a new framework or just find a new problem to consider...
I will write some thing more about the interesting problem we are considering at some later point when appropiate.

But I would like to make some notes for some random stuffs I learned during the visit.

1) the power of two choices: consider the following random experiment: throw n balls randomly into n bins (by randomly, i mean for each ball, uniformly and randomly choose a bin to put the ball.) It is a well known fact that the expected maximum load of any bin is O(logn/loglogn). This means this purely random procedure is not a very good scheme to achieve load balancing, albeit it has already been used in many places for this purpose.

However, if instead of completely random throw of each ball, we randomly pick two bins and throw the ball into the bin with smaller load, the expected maximum load becomes O(loglogn). With only one more choice and very limited rationalness (make judgement only between two choices), we can achieve a exponential improvement. Fascinating!

For more general results and references of its numerous applications, see Michael Mitzenmacher's phd thesis.

2)Compressed Sensing: this is now a super hot topic in ECE (also some branches of CS related to signal, compression, etc.) during the recent breaktrhough by Candes, Romberg, Tao and Donoho. But it seems to me not many TCS and DB people (my impression)are high on this topic (I am only aware Cormode and Muthukrishnan have done some work on this). I read through a basic tutorial and browse a few paper abstracts on this and got some rough ideas about the field. Not only the theory borrows ideas from a variety of branchs in applied math and nicely fit them togther, but the practical implication are also remarkable. Many traditional compression schemes will be (or have been) rewritten due to the new developed theory and techniques. My question is whether it can be applied to the approximate query processing and/or query optimization in some way to achieve a better performance than wavelet which has been extensively used for those applications.

3) A question I don't have a good answer: Raghav told me this cute problem when he was reading the book '' the probabilistic method'' by Alon and Spencer. It is a exercise problem on the first chapter.

x and y are two independent identically distributed random variable.
Prove: Pr(|x-y|<=1) >= Pr (1<=x-y<=2) (no absolute value in RHS) I suspect the following (tighter generalization) is also true Pr(0<=x-y<=t) >= Pr(a<=x-y<=a+t) for any a,t>=0

If you try to do it in a brute-force way, it boils down to proving an fairly clean inequality.
But the inequality seems not quite so easy to prove. I have some idea how to prove it, but it is pretty messy and i don't have enough incentive to figure out the details.
Since it is only an exercise problem in the first chapter, I believe the problem should have a very simple but tricky solution. Let me know if you know the solution.

Tuesday, August 4, 2009

The second day.

The first talk was by Nitin. Some basic knowledge about wireless networks - however, it is completely new to me.

Then, Dina Katabi gave a talk about network coding, actually mainly about her recent sigcomm papers. Her talk was very light and easy to follow. Some gossip:). There was post a while ago on mitbbs computer science board. The post raised the following "interesting" question "who is the prettiest cs phd female student you've even seen." Many candidates were proposed and numerous argument ensued , google photo search was extensively used, personal stories one after another, ppl's aesthetics tastes were questioned.......it was a mess all in all. Nevertheless, Dina was consensusly aknowledged as one of the prettiest! I have to say i sat pretty far from her and wouldn't be able to personally justify this claim..:P

I slept over the second network coding talk.

In the afternoon, P.R.Kumar gave a very good talk on the QoS scheduling(this paper). The model proposed is fairly clean and a surprisingly neat neccessary and sufficient condition was proven. The analysis is a ECE type analysis rather the worst case analysis which TCS community usually adopted. In the proof, a very interesting theorem, called blackwell approachability theorem, was used. It was the first time I saw this theorem and it seems to be a pretty useful theorem. The theorem basically describes as follows:

We have a game that goes in iterations. in each iteration t, we can take some action a(t) and will
get a reward v(t).

v(t) is a n-dim random vector s.t.
(1) its distrbution depends only on a(t)
(2) Mean = R(a) (R(a) is a vector)

Let A be a close set in R^n.
We say A is approachable if there is some policy (strategy of taking actions) s.t.
the time average of rewards (\sum_{t\in T}v(t)*(1/T)) will come arbitrarily close to A (or in A) for enough number of iterations with arbitrary high prob.

The Blackwell thm says that
for any x\notin A, if there is an action a with R(a) lies on the other side of the hyperplane H
that passing through the point of A closest of x and perpendicular to line xy
then A is approachable.

Monday, August 3, 2009

the UIUC wireless summer school- first day

Today, we had two EE talks which require substantial background knowledge on communication that i don't have.....:( so I didn't follow much.

I guess tomorrow will be much better since we will have CS talks!

We also had a poster session this afternoon and there were like twenty posters there.
Among those, I want to mention one that is about a very interesting problem, the index coding problem. It is a network coding type problem but the formulation is much cleaner.

The problem describes like this. The source have a {0,1} string X of length n.
There are a bunch of receiver r_i. Each r_i is particular interested in knowing the x_i (th) bit of X. Each r_i also (already) knows a subset of bits of X (this is called the side information).
Find a coding scheme, such that each r_i will know the x_i bit of X and the source broadcasts as few bits as possible ("broadcast" means everybody can hear it).

For a motivating example, suppose each r_i knows X except the x_i(th) bit.
Then the source only need to broadcast the parity of X (number of 1's in X) which takes only 1 bits).

The facinating part of the problem is that the optimal coding scheme can be charaterized by a rank minimization problem on the adjacency matrix of the "side information graph" and also quite related the shannon capacity of a graph and also has similar type of sanwich property.
Refer to this paper for even more interesting details.

Tuesday, July 28, 2009

How fast can you expand a polynomial? (Cont)

I finally get some time to record the O(n^2) time algorithm.
The O(n^2\log^2 n) time algorithm can be found from the appendix of this paper of mine.

Now, let me present an O(n^2) time algorithm.

First, an EXPRESSION can be thought as a tree where all variables and constants are leaves and inner nodes are operations (* and +)

E.g. (x^2+x+1)(2x+1) corresponds to

(Algorithm 1)
1. Choose n+1 different numbers a_1, ...., a_{n+1} .
2. Evaluate the polynomial at these points, i.e., compute F(a_i). (each evaluation takes linear time (bottom-up over the tree). So O(n^2) time in total)
3. Use any O(n^2) polynomial interpolation algorithm to find the coefficient (actually reduces to finding a solution for a linear system).

Cute? Still have to rely on some other nontrivial algorithm to invert the Vandermonde system in O(n^2) time (ref [3][4] in the wiki page).

Let us look at a cuter one.
(Algorithm 2)

Suppose the polynomial if F(x)=\sum_j c_j x^j.
Let e_i be the n-dimensional zero vector except that the ith entry is 1, i.e.,e_i=(0,0,..,1,...,0,0).
Let d_i be the n-dimensional vector which is the DFT(discrete Fourier transformation) of e_i.
Denote u be the nth root of unit and u={1,u,u^2,....,u^{n-1}}.
Therefore, e_i[j]=\sum_k d_i[k] u^{jk}. Equivalently, e_i=\sum_k d_i[k] u^k
where u^k={1,u^k,u^{2k},u^{3k},.....}

Computing all d_i takes O(n^2) time.
Let c be the coefficient vector (of F).
c_i = <c, e_i> (inner product)
<c,e_i>=\sum_k d_i[k]<c, u^k>....................(1)

But what is <c,u^k> ? it is exactly the numerical value of F(u^k)!!! (think about it)

Now, we have our algorithm:
1.compute d_i for all i ....................use DFT formula directly (don't need FFT) ,O(n^2)
2.compute F(u^k) for all k.....................n poly evaluations over complex, O(n^2)
3.use (1) to compute the coefficient.................O(n^2)

So, nothing fancy here(except the idea:). Cuter, right?

OPEN question: can we do better? or the lower bound is O(n^2)

Saturday, July 18, 2009

How fast can you expand a polynomial?

I just came across the following very basic problem.
given a expression of univariable polynomial which is constructed by the following recursion rules
(1) A constant or the variable x is a EXPRESSION.
(2) The sum of two EXPRESSIONS is a EXPRESSION
(3) The product of two EXPRESSION is a EXPRESSION
e.g.
f(x)=((1+x+x^2)(x^2+2x^3)+x^3(2+3x^4))(1+2x)

We know the degree of the polynomial and the length of the EXPRESSION are of sizes O(n).
The question is how fast can you expand the EXPRESSION into a standard form, i.e.,
\sum_{i=0}^n c_i x^i
In an other word, how fast can you compute the coefficients c_i for all i.

It is pretty trivial to achieve O(n^3). (the running time analysis needs a bit counting).
With the help of FFT(fast Fourier transformation), you can do O(n^2\log^2n).
I think I can do O(n^2), but need some nontrivial math...

My question is whether \Omega(n^2) is the lower bound..
or can we do even better?
Could anyone point me out some reference??

Tuesday, July 14, 2009

A Beatiful Science Dream Came TRUE on the 20th Anniversary!

Cold Fusion is proven to be real.
The discovery of an inexhausible new source of energy!!

However, a potential new weapon may also be on its way and it is argued that the ``new weapon'' would be extremely hard to detect and the damage could be massive...

Bless or curse....Let's see...

Check this out
http://goldismoney.info/forums/showthread.php?t=361895

Monday, July 6, 2009

CRAZY summer

(copied from my msn space)

I remember I said in the previous blog I was going to have a crazy summer, but I didn't expect it is this crazy..........

Just passed the SODA deadline a few minute ago.

I had two submissions: one is about a clustering problem with some new interesting constraints and the other is on an interesting generalization of the classic machine scheduling problem. Hopefully, I will get lucky results on a nonempty subset of them....

A few days ago, I was in Providence attending SIGMOD09. Something interesting to say about the conf.

First is the super surprising and exciting news that I got the VLDB09 best paper award. I had never expected that before and was really surprised.

Honestly, it is a bit too much to a student like me who is sort of new in DB and I am not quite ready....Perhas it means to be an alert dictating me to learn more DB..:)

I really enjoyed the conference since I made a lot of friends and met quite a few famous researcher. A new discovery is there are so many prof/student graduated from Fudan are doing DB right now. It is not at all easy to find so many Fudaner outside Fudan and all are doing DB, ranging prof/ass prof/phd/lab researcher/industy ppl.....

It was also a good chance to meet those legendary big name in DB, e.g. micheal stonebreaker, ronald fagin..., they are true bigboss da学霸!

This is the story. When I was doing my PODS talk, fagin was sitting in the first row but I didn't know that was fagin at that time.

I had prepared the talk a couple of times and the time should be just enough. During the talk, fagin kept interrupting and asking me questions. I was thinking "这老头谁啊。。。". Since fagin always asked question, there was an other lady, she thought maybe question was allowed during the talk and tried to ask one. Before she finished her first sentence, the session chair stopped her with "we are running out time, plz take the questions offline...". But fagin kept asking questions and the session chair responded with super smile in his face...Finally, I could quite finish all the stuffs I wished to cover and skipped a lot of details. Later on, my advisor told me that was ronald fagin.....

To me, this SIGMOD was not as exciting as the just-ended STOC09. A major reason is that I can only understand a limited number of talks. Hope next time things will get better.

A couple of technical remarks: one is column store seems becoming very hot now. Another is probabilistic database, many sigmod/pods sessions are devoted for it. See also the most recent issue of the Comm ACM, there is a long artical about it. I think it should be the way we manage uncertain data in real life applications, but just not sure how far it will go, will this theory result in an indispensible commercial system? Let's see.

In Aug, I will go to UIUC for a wireless network summer school and then go to Germany to visit MPI and then to France for VLDB. Crazy summer goes on an on....

Friday, June 5, 2009

one of the STOC 2009 best papers

STOC09 just ended a few days ago. I really enjoyed it.

I met quite a few brilliant chinese theory students. Most of them are knowledgeable and sharp and more importantly

,have a very good taste of theory. Talking to them was a lot fun.

I listened to a lot talks. Basically, for all these talk, it was simply impossible to understand in 20 min the technical details

which are typically considered the major reason for the acceptance of the paper to stoc. However, a notable exception

is one of the two best papers by Moser, a 3-slides-constructive proof of the lovasz local lemma.

The creativity and simplicity of the proof are remarkable.

Check out Lance's blog on this topic for a very short description of this proof.

Thursday, May 28, 2009

vldb 2009

Finally, our paper on ranking over probabilistic datasets is accepted in VLDB. Since I like this paper very much, probly even more than any theory paper I have written so far, I just blog it here, for remembrance. To me, the ideas are neat and the algorithms are elegant, also very practical (I did a lot of experiments).

This paper has been through a pretty tough time. The first draft was our report for the database course project. At that time, the paper was titled "clustering and ranking on prob databases" and devoted much space talking about clustering. We submitted it to vldb08 and got 2 weak accepts and one weak rej. Then the paper was rolloverred (a policy that some boarderline papers can be rolled to the next same level conf(sigmod/vldb) with the same set of reviewers) to sigmod09 and was reviewed by the same set of reviewers but got the same review results.

Finally, we decide to completely remove the clustering part and focus only on the ranking part.

More results, explanations and experiment were added. Especially, I added the ranking approximation part where I borrowed some ideas and results from signal processing (actually fourier transformation). This makes our new ranking function an very effective and efficient one.

Wednesday, May 20, 2009

Godel Prize

I have heard these two papers for a few years, but never got a chance to have a look.

From communication of ACM.

ACM Group Honors Researchers Who Discovered Zig-Zag Graph That Improves Design of Robust Computer Networks
AScribe Newswire (05/19/09)

ACM's Special Interest Group on Algorithms and Computing Theory (SIGACT) and the European Association for Theoretical Computer Science have awarded the Godel Prize to Omer Reingold, Salil Vadhan, and Avi Wigderson for developing a new type of graph that enables the construction of large expander graphs, which are important in designing robust computer networks and constructing theories of error-correcting computer codes. The new zig-zag graph was able to help solve the problem of detecting a path from one node to another in very small storage for undirected graphs. In their paper, "Entropy Waves, the Zig-Zag Graph Product and New Constant Degree Expanders," Reingold, Vadhan, and Wigderson discussed their research on a family of expander graphs, which are used for critical computer theory applications. The sparse but highly connected expander graphs were constructed using the zig-zag graph method, which enables the construction of large expanders from smaller expanders while preserving degree and connectivity. In a second paper, "Undirected Connectivity in Log-Space," Reingold proved that connectivity in undirected graphs can be solved in logarithmic storage, and that any connected graph is a very weak expander, but applying the zig-zag method makes it possible to turn the graph into an expander of only moderately large size. Omer Reingold is a professor of computer science at the Weizmann Institute of Science, Salil Vadhan is a professor of computer science and applied mathematics at Harvard University's School of Engineering and Applied Sciences, and Avi Wigderson is a professor at the School of Mathematics, Institute for Advanced Study. The Godel Prize, which includes an award of $5,000, is named after Kurt Godel in recognition of his major contributions to mathematical logic and the foundations of computer science.

Tuesday, May 12, 2009

The next big thing??

It is called Wolfram/Alpha project. From what I saw from the blog

, it is capable of answering fairly complicated (even scientific) queries which seems to be impossible or very hard for google.

It is claimed as "the next big thing" after windows and google.

and the release time is this month. let's see.

Saturday, May 9, 2009

"tussles?" among theorists

I am becoming a fan of google products and starting to try out various cool things like blogger and calender etc. Blogger allows you to subscribe to several others' blogs and view updates from these aggregately in a single page. I follow a few theoreticians' blogs, like Richard Lipton's P=NP,

there is always a lot of information that interests me.

Here is a quite recent post

Papers that the Gödel prize missed?

in which Dr Lipton listed a few papers that he thought should also deserve the prize but were missed.

There were quite a few replies with many of them suggested some extension to the list.

Among those, I found Mihai's comment which raised interesting debates among those people.

He claimed the prize was too 'complexity biased' and tended to ignore algorithm

(maybe I should say data structure here?) papers.

He also wrote a even more ``interesting'' followup post.

I assume no copyright issue and just cite his first paragraph:

As you probably know, I think theory is in a deep crisis: our obsession to turn into philosophy (prividing unappreciated service to disciplines we do not understand), stemming from our deep commitment to become as ignorant and irrelevant as possible inside the CS department; the influx of people who like to pretend we're Mathematics, and have no idea what computation means; the obsession with the "polynomial" versus "non-polynomial" and it's many consequences; etc etc etc. Since I am not inclined to write novels, I have never been able to put all these arguments inside a coherent text.

I really laughed my ass off by reading this, notwithstanding I don't really agree on it.

I heard that Mihai is quite a controversial figure, but this was the first time I really saw it.

He seems to be a disbeliever of approximation algorithms which is the area I have been working on. Also his words:

"But whenever I say algorithms it should read "algorithms, minus whatever the TCS community is doing in approximation algorithms".

I have serious concerns about our field of approximation algorithms (with important exceptions), and I think it's better to treat it as an entirely separate field."

Although I am myself in no position to judge the community nor his comments, I think what he said is quite thought-provoking. On one hand, TCS is full of amazing masterpieces in every subfield which is totally capable to make people working on that subfield be obsessed and tend to depreciated works on other field. On the other hand, a majority of papers in every field are not that significant and influential, not that novel, not that a breakthrough which always leave some space for others to criticize. My opinion is that a piece of research is useful, if it is not trivial, AND answer (even only a small number of) peoples doubt OR raise people's interests. Among other cs subareas I have read papers from, for eg. DB and Networking, TCS has a much larger proportion of papers satisfying my criteria - my experience is that in theory, as long as the paper is related, no matter it appears in a 1-tier or 2-tier conf, there are something I want to know and need to know while in db or networking, papers from 2-tier confs are mostly used to fill out related work sections or used as inferior examples to compare with other 'superior' ones(maybe because I didn't read as many db or networking papers??).

Just some random thought. Blogging indeed kills a lot spare time, might be better than playing the fantastic game Dota , which is more time-consuming and stressing.

First blog

google's products are always cool.

TCS-DB-er