TCS-DB-er: August 2009

Sunday, August 30, 2009

VLDB09 - journal like review process.

This year's vldb was held in Lyon, France. I came back from there yesterday night. The jet-lag woke me at 5pm...so I have some time to do some notes here.

The first thing i want to note is the undergoing change of the review process of Proc. of VLDB. This year' s vldb includes a couple of (<=10) papers that have gone through a journal like review process(multiple rounds), rather than the traditional one round thing for conference. Next year's proceeding will include 10% of such papers. From 2011, there will be no traditional conference PC and everything will be shifted to the journal-like track.

I talked to quite a few of my friends. Most of them did not seem quite enthusiastic about the transition. I guess the reason might be that they are able to get papers accepted anyway and the multi-round process only implies an increasesing workload but not a higher chance to them. Literally quote (translated into english) some of their words "I bet the review will be 'more experiments on xxx', 'more references on xxx' , ' more xxx on xxx' "; "more experiments, on a new dataset, these are only easy to say"," the worst thing could happen is that a reviewer didn't like the paper intially but asked for a major revision rather than gave a outright reject. Then the paper was turned around multiple times, the content was rewritten and exp was redone and the reviewer still didn't like the paper and finally rejected it"......I have a few journal papers and went through the multiround process myself twice. Fortunately, my experience was not that bad. The first paper was submitted to transaction on information theory. we got two quite positive reviews. One review was like 5 pages long with numerous detailed comments, but each of those was fairly easy to fix and the paper got accepted after that round. The other one was a very short paper which i submitted to Information Processing Letter. The process was even smoother. I got two reviews and one of them only had one sentence for recommending acceptence. Responsing the other review took me less than two hours and the paper was accetped after this round. I hope this new vldb journal thing can improve the paper quality substantially without bringing us too much pain....

Saturday, August 22, 2009

Some random notes from MPI

Today is the last day of my stay in Saarbrucken, Germany. During the last two weeks, I have been here visiting Julian Mestre, a former student of my current advisor Samir Khuller.

We have been working on a stochastic optimization problem and already had some exciting progress. Cooperating with Julian is a lot fun. We have a big overlap of our background, which makes the catchup very easy, meanwhile our thinking patterns are quite different and complements to each other well. (As I observed) Julian tends to start off from concrete examples (worst/average/best case examples), making insightful observation and then generalizing until the discover of the solution which works for all case. The way I think usually begins with some very vague framework and then filling in details and testing whether it works for concrete examples. if it didn't work, then try to patch it if possible or just scratch it and start from a new framework or just find a new problem to consider...
I will write some thing more about the interesting problem we are considering at some later point when appropiate.

But I would like to make some notes for some random stuffs I learned during the visit.

1) the power of two choices: consider the following random experiment: throw n balls randomly into n bins (by randomly, i mean for each ball, uniformly and randomly choose a bin to put the ball.) It is a well known fact that the expected maximum load of any bin is O(logn/loglogn). This means this purely random procedure is not a very good scheme to achieve load balancing, albeit it has already been used in many places for this purpose.

However, if instead of completely random throw of each ball, we randomly pick two bins and throw the ball into the bin with smaller load, the expected maximum load becomes O(loglogn). With only one more choice and very limited rationalness (make judgement only between two choices), we can achieve a exponential improvement. Fascinating!

For more general results and references of its numerous applications, see Michael Mitzenmacher's phd thesis.

2)Compressed Sensing: this is now a super hot topic in ECE (also some branches of CS related to signal, compression, etc.) during the recent breaktrhough by Candes, Romberg, Tao and Donoho. But it seems to me not many TCS and DB people (my impression)are high on this topic (I am only aware Cormode and Muthukrishnan have done some work on this). I read through a basic tutorial and browse a few paper abstracts on this and got some rough ideas about the field. Not only the theory borrows ideas from a variety of branchs in applied math and nicely fit them togther, but the practical implication are also remarkable. Many traditional compression schemes will be (or have been) rewritten due to the new developed theory and techniques. My question is whether it can be applied to the approximate query processing and/or query optimization in some way to achieve a better performance than wavelet which has been extensively used for those applications.

3) A question I don't have a good answer: Raghav told me this cute problem when he was reading the book '' the probabilistic method'' by Alon and Spencer. It is a exercise problem on the first chapter.

x and y are two independent identically distributed random variable.
Prove: Pr(|x-y|<=1) >= Pr (1<=x-y<=2) (no absolute value in RHS) I suspect the following (tighter generalization) is also true Pr(0<=x-y<=t) >= Pr(a<=x-y<=a+t) for any a,t>=0

If you try to do it in a brute-force way, it boils down to proving an fairly clean inequality.
But the inequality seems not quite so easy to prove. I have some idea how to prove it, but it is pretty messy and i don't have enough incentive to figure out the details.
Since it is only an exercise problem in the first chapter, I believe the problem should have a very simple but tricky solution. Let me know if you know the solution.

Tuesday, August 4, 2009

The second day.

The first talk was by Nitin. Some basic knowledge about wireless networks - however, it is completely new to me.

Then, Dina Katabi gave a talk about network coding, actually mainly about her recent sigcomm papers. Her talk was very light and easy to follow. Some gossip:). There was post a while ago on mitbbs computer science board. The post raised the following "interesting" question "who is the prettiest cs phd female student you've even seen." Many candidates were proposed and numerous argument ensued , google photo search was extensively used, personal stories one after another, ppl's aesthetics tastes were questioned.......it was a mess all in all. Nevertheless, Dina was consensusly aknowledged as one of the prettiest! I have to say i sat pretty far from her and wouldn't be able to personally justify this claim..:P

I slept over the second network coding talk.

In the afternoon, P.R.Kumar gave a very good talk on the QoS scheduling(this paper). The model proposed is fairly clean and a surprisingly neat neccessary and sufficient condition was proven. The analysis is a ECE type analysis rather the worst case analysis which TCS community usually adopted. In the proof, a very interesting theorem, called blackwell approachability theorem, was used. It was the first time I saw this theorem and it seems to be a pretty useful theorem. The theorem basically describes as follows:

We have a game that goes in iterations. in each iteration t, we can take some action a(t) and will
get a reward v(t).

v(t) is a n-dim random vector s.t.
(1) its distrbution depends only on a(t)
(2) Mean = R(a) (R(a) is a vector)

Let A be a close set in R^n.
We say A is approachable if there is some policy (strategy of taking actions) s.t.
the time average of rewards (\sum_{t\in T}v(t)*(1/T)) will come arbitrarily close to A (or in A) for enough number of iterations with arbitrary high prob.

The Blackwell thm says that
for any x\notin A, if there is an action a with R(a) lies on the other side of the hyperplane H
that passing through the point of A closest of x and perpendicular to line xy
then A is approachable.

Monday, August 3, 2009

the UIUC wireless summer school- first day

Today, we had two EE talks which require substantial background knowledge on communication that i don't have.....:( so I didn't follow much.

I guess tomorrow will be much better since we will have CS talks!

We also had a poster session this afternoon and there were like twenty posters there.
Among those, I want to mention one that is about a very interesting problem, the index coding problem. It is a network coding type problem but the formulation is much cleaner.

The problem describes like this. The source have a {0,1} string X of length n.
There are a bunch of receiver r_i. Each r_i is particular interested in knowing the x_i (th) bit of X. Each r_i also (already) knows a subset of bits of X (this is called the side information).
Find a coding scheme, such that each r_i will know the x_i bit of X and the source broadcasts as few bits as possible ("broadcast" means everybody can hear it).

For a motivating example, suppose each r_i knows X except the x_i(th) bit.
Then the source only need to broadcast the parity of X (number of 1's in X) which takes only 1 bits).

The facinating part of the problem is that the optimal coding scheme can be charaterized by a rank minimization problem on the adjacency matrix of the "side information graph" and also quite related the shannon capacity of a graph and also has similar type of sanwich property.
Refer to this paper for even more interesting details.

TCS-DB-er