We consider the problem of spaceefficiently estimating the number of simplices in a hypergraph stream. This is the most natural hypergraph generalization of the highlystudied problem of estimating the number of triangles in a graph stream. Our input is a kuniform hypergraph H with n vertices and m hyperedges, each hyperedge being a ksized subset of vertices. A ksimplex in H is a subhypergraph on k+1 vertices X such that all k+1 possible hyperedges among X exist in H. The goal is to process the hyperedges of H, which arrive in an arbitrary order as a data stream, and compute a good estimate of T_k(H), the number of ksimplices in H.
We design a suite of algorithms for this problem. As with trianglecounting in graphs (which is the special case k = 2), sublinear space is achievable but only under a promise of the form T_k(H) ≥ T. Under such a promise, our algorithms use at most four passes and together imply a space bound of O(ε^{2} log δ^{1} polylog n ⋅ min{(m^{1+1/k})/T, m/(T^{2/(k+1)})}) for each fixed k ≥ 3, in order to guarantee an estimate within (1±ε)T_k(H) with probability ≥ 1δ. We also give a simpler 1pass algorithm that achieves O(ε^{2} log δ^{1} log n⋅ (m/T) (Δ_E + Δ_V^{11/k})) space, where Δ_E (respectively, Δ_V) denotes the maximum number of ksimplices that share a hyperedge (respectively, a vertex), which generalizes a previous result for the k = 2 case. We complement these algorithmic results with space lower bounds of the form Ω(ε^{2}), Ω(m^{1+1/k}/T), Ω(m/T^{11/k}) and Ω(mΔ_V^{1/k}/T) for multipass algorithms and Ω(mΔ_E/T) for 1pass algorithms, which show that some of the dependencies on parameters in our upper bounds are nearly tight. Our techniques extend and generalize several different ideas previously developed for triangle counting in graphs, using appropriate innovations to handle the more complicated combinatorics of hypergraphs.
more »
« less
Maximum Matching in Two, Three, and a Few More Passes Over Graph Streams
We consider the maximum matching problem in the semistreaming model formalized by Feigenbaum, Kannan, McGregor, Suri, and Zhang that is inspired by giant graphs of today. As our main result, we give a twopass (1/2 + 1/16)approximation algorithm for trianglefree graphs and a twopass (1/2 + 1/32)approximation algorithm for general graphs; these improve the approximation ratios of 1/2 + 1/52 for bipartite graphs and 1/2 + 1/140 for general graphs by Konrad, Magniez, and Mathieu. In three passes, we are able to achieve approximation ratios of 1/2 + 1/10 for trianglefree graphs and 1/2 + 1/19.753 for general graphs. We also give a multipass algorithm where we bound the number of passes precisely—we give a (2/3 − ε) approximation algorithm that uses 2/(3ε) passes for trianglefree graphs and 4/(3ε) passes for general graphs. Our algorithms are simple and combinatorial, use O(n log n) space, and (can be implemented to) have O(1) update time per edge.
For general graphs, our multipass algorithm improves the best known deterministic algorithms in terms of the number of passes:
* Ahn and Guha give a (2/3−ε)approximation algorithm that uses O(log(1/ε)/ε2) passes, whereas our (2/3 − ε)approximation algorithm uses 4/(3ε) passes;
* they also give a (1 − ε)approximation algorithm that uses O(log n · poly(1/ε)) passes, where n is the number of vertices of the input graph; although our algorithm is (2/3−ε)approximation, our number of passes do not depend on n.
Earlier multipass algorithms either have a large constant inside bigO notation for the number of passes or the constant cannot be determined due to the involved analysis, so our multipass algorithm should use much fewer passes for approximation ratios bounded slightly below 2/3.
more »
« less
 Award ID(s):
 1650992
 NSFPAR ID:
 10041637
 Date Published:
 Journal Name:
 Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


We give an algorithm to find a minimum cut in an edgeweighted directed graph with n vertices and m edges in O ̃(n · max{m^{2/3}, n}) time. This improves on the 30 year old bound of O ̃(nm) obtained by Hao and Orlin for this problem. Using similar techniques, we also obtain O ̃ (n^2 /ε^2 )time (1+ε)approximation algorithms for both the minimum edge and minimum vertex cuts in directed graphs, for any fixed ε. Before our work, no (1+ε)approximation algorithm better than the exact runtime of O ̃(nm) is known for either problem. Our algorithms follow a twostep template. In the first step, we employ a partial sparsification of the input graph to preserve a critical subset of cut values approximately. In the second step, we design algorithms to find the (edge/vertex) mincut among the preserved cuts from the first step. For edge mincut, we give a new reduction to O ̃ (min{n/m^{1/3} , √n}) calls of any maxflow subroutine, via packing arborescences in the sparsifier. For vertex mincut, we develop new local flow algorithms to identify small unbalanced cuts in the sparsified graph.more » « less

Megow, Nicole ; Smith, Adam (Ed.)Structural balance theory studies stability in networks. Given a nvertex complete graph G = (V,E) whose edges are labeled positive or negative, the graph is considered balanced if every triangle either consists of three positive edges (three mutual "friends"), or one positive edge and two negative edges (two "friends" with a common "enemy"). From a computational perspective, structural balance turns out to be a special case of correlation clustering with the number of clusters at most two. The two main algorithmic problems of interest are: (i) detecting whether a given graph is balanced, or (ii) finding a partition that approximates the frustration index, i.e., the minimum number of edge flips that turn the graph balanced. We study these problems in the streaming model where edges are given one by one and focus on memory efficiency. We provide randomized singlepass algorithms for: (i) determining whether an input graph is balanced with O(log n) memory, and (ii) finding a partition that induces a (1 + ε)approximation to the frustration index with O(n ⋅ polylog(n)) memory. We further provide several new lower bounds, complementing different aspects of our algorithms such as the need for randomization or approximation. To obtain our main results, we develop a method using pseudorandom generators (PRGs) to sample edges between independentlychosen vertices in graph streaming. Furthermore, our algorithm that approximates the frustration index improves the running time of the stateoftheart correlation clustering with two clusters (GiotisGuruswami algorithm [SODA 2006]) from n^O(1/ε²) to O(n²log³n/ε² + n log n ⋅ (1/ε)^O(1/ε⁴)) time for (1+ε)approximation. These results may be of independent interest.more » « less

Chakrabarti, Amit ; Swamy, Chaitanya (Ed.)A Boolean maximum constraint satisfaction problem, MaxCSP(f), is specified by a predicate f:{1,1}^k → {0,1}. An nvariable instance of MaxCSP(f) consists of a list of constraints, each of which applies f to k distinct literals drawn from the n variables. For k = 2, Chou, Golovnev, and Velusamy [Chou et al., 2020] obtained explicit ratios characterizing the √ nspace streaming approximability of every predicate. For k ≥ 3, Chou, Golovnev, Sudan, and Velusamy [Chou et al., 2022] proved a general dichotomy theorem for √ nspace sketching algorithms: For every f, there exists α(f) ∈ (0,1] such that for every ε > 0, MaxCSP(f) is (α(f)ε)approximable by an O(log n)space linear sketching algorithm, but (α(f)+ε)approximation sketching algorithms require Ω(√n) space. In this work, we give closedform expressions for the sketching approximation ratios of multiple families of symmetric Boolean functions. Letting α'_k = 2^{(k1)} (1k^{2})^{(k1)/2}, we show that for odd k ≥ 3, α(kAND) = α'_k, and for even k ≥ 2, α(kAND) = 2α'_{k+1}. Thus, for every k, kAND can be (2o(1))2^{k}approximated by O(log n)space sketching algorithms; we contrast this with a lower bound of Chou, Golovnev, Sudan, Velingker, and Velusamy [Chou et al., 2022] implying that streaming (2+ε)2^{k}approximations require Ω(n) space! We also resolve the ratio for the "atleast(k1)1’s" function for all even k; the "exactly(k+1)/21’s" function for odd k ∈ {3,…,51}; and fifteen other functions. We stress here that for general f, the dichotomy theorem in [Chou et al., 2022] only implies that α(f) can be computed to arbitrary precision in PSPACE, and thus closedform expressions need not have existed a priori. Our analyses involve identifying and exploiting structural "saddlepoint" properties of this dichotomy. Separately, for all threshold functions, we give optimal "biasbased" approximation algorithms generalizing [Chou et al., 2020] while simplifying [Chou et al., 2022]. Finally, we investigate the √ nspace streaming lower bounds in [Chou et al., 2022], and show that they are incomplete for 3AND, i.e., they fail to rule out (α(3AND})ε)approximations in o(√ n) space.more » « less

We present O(log logn)round algorithms in the Massively Parallel Computation (MPC) model, with ˜O(n) memory per machine, that compute a maximal independent set, a 1 + ε approximation of maximum matching, and a 2 + ε approximation of minimum vertex cover, for any nvertex graph and any constant ε > 0. These improve the state of the art as follows: • Our MIS algorithm leads to a simple O(log log Δ)round MIS algorithm in the CONGESTEDCLIQUE model of distributed computing, which improves on the ˜O (plog Δ)round algorithm of Ghaffari [PODC’17]. • OurO(log logn)round (1+ε)approximate maximum matching algorithm simplifies or improves on the following prior work: O(log2 logn)round (1 + ε)approximation algorithm of Czumaj et al. [STOC’18] and O(log logn)round (1 + ε) approximation algorithm of Assadi et al. [arXiv’17]. • Our O(log logn)round (2+ε)approximate minimum vertex cover algorithm improves on an O(log logn)round O(1) approximation of Assadi et al. [arXiv’17].more » « less