Question

I have problem that is reducible to the following: From a collection of stacks, find all...

I have problem that is reducible to the following:

From a collection of stacks, find all items whose "keys" are on all stacks.

My current solution to this problem is to just pop things off as quickly as possible store the items in the language's set type, compute the intersection every once in a while (the stacks are refilling constantly), and use that set, repeat.

The issue here is that the equivalent of the pop operation is expensive (I can do this at most 4k times a second), and it kills the chance of multithreading (as pops are destructive).

Is there a way of doing this such that I can multithread the intersection bit? I cannot practically use shared memory or requeuing items. The threads can communicate with each other via pipes/sockets if need be, but such communication should be kept to a minimum (not really interesting in starting up some whole separate client/server thing just for this part of my application)

As an idea of what I am dealing with, there can be up to 15 million items on the queues at any one time. By the time its "over" all items will have a match.

Ideas?

Clarification: These are message stacks (well queues really but the same thing) over the network. A pop is a consume operation over network, and push is a publish. Thats why its so slow. Since the issues is one of latency rather than bandwidth, multithreading can potentially help. I am not CPU bound. I have ~10 stacks, and up to 10 million items at a time. I have no control over the implementation of the stacks.

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Apparently this is a distributed systems problem. You have ~ 10 stacks, each with up to 10 million items at a time, and each stack living on a different machine.

Here is one simple candidate solution. Have a separate "monitor" machine, whose sole purpose is to continually compute the intersection. Any time one of the stacks is modified, the machine storing that stack should send a message to the monitor message describing the change. (These diffs can be batched, depending upon your latency requirements.)

Now you have all the data on a single machine, the monitor machine. When running on a single machine, you can completely avoid all of the concurrency issues associated with multithreading or distributed systems -- e.g., the need for synchronization, locks, etc.

Moreover, the amount of data is small enough that it could easily be stored in the available RAM on that monitor machine. (Why? If each entry is large, hash it first using a good hash function and a large enough hash output that you won't see collisions. If each hash value is 128 bits, then storing all 100 million hashed items takes 1.6GB. If you use SHA256 truncated to 128 bits as your hash function, by the birthday paradox, the chances of encountering a hash collision are incredibly small.)

Once all of the data is living in RAM on a single machine, the problem becomes much easier. For example, one approach would be to build an index data structure: a hashmap that maps each key to a 10-bit bitmap that indicates which of the stacks it is stored on. You can easily update this hashmap as the individual stacks change.

You can also easily use this hashmap to compute the intersection: just have a doubly linked list threaded through the entries of this hashmap where the bitmap is equal to 11111111111 (all 1's). Any time a bitmap changes from 11111111111 to something else, you remove it from the doubly linked list. Any time a bitmap changes from something else to 11111111111, you insert it into the doubly linked list.

The amount of data you need to transfer from other machines to the monitor machine is very small, especially since you only need to send the hashed items to the monitor, not the original items. If each of the 10 machines will do at most 4k push/pop's per second (taken from your question), then that's 64 KB of hashed data per second per machine. If we add another 32 bytes or so of overhead (packet headers), that's about 200 KB/s of data out of each machine, or about 2 MB/s of data into the monitor machine -- a very manageable amount. On a 1 Gbps Ethernet link you won't notice this, and you might not notice it even on a 100 Mbps link. You can reduce the amount of traffic by a factor of 2-5x by batching such messages and/or by using a shorter hash.

This should provide an efficient algorithm to compute the intersection, and keep it updated on the fly as each of the stacks change.

Add a comment
Know the answer?
Add Answer to:
I have problem that is reducible to the following: From a collection of stacks, find all...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • I have a question on an assignment for C++. I have my code available here and...

    I have a question on an assignment for C++. I have my code available here and just want someone to look over it using MS VS 2010 express, for that is what my instructor is using. Write a program that mimics a calculator. The program should take as input two integers and the opreation to be performed. It should then output the numbers, the operator, and the result. (For division, if the denominator is zero, output an appropriate message.) //**This...

  • help finish Queue, don't think I have the right thing. # 1. After studying the Stack...

    help finish Queue, don't think I have the right thing. # 1. After studying the Stack class and testStack() functions in stack.py # complete the Queue class below (and test it with the testQueue function) # # 2. Afer studying and testing the Circle class in circle.py, # complete the Rectangle class below (and test it with the testRectangle function) # # # 3. SUBMIT THIS ONE FILE, with your updates, TO ICON. # # # NOTE: you may certainly...

  • HELLO I AM CURRENTLY IN DIFFERENTIAL EQUATIONS PLEASE EXPLAIN EACH STEP SO I CAN LEARN FROM...

    HELLO I AM CURRENTLY IN DIFFERENTIAL EQUATIONS PLEASE EXPLAIN EACH STEP SO I CAN LEARN FROM YOU (I KNOW SOME PEOPLE ONLY CARE ABOUT TEH ANSWER, BUT WILL REALLY APPRECIATE IT) TO SAVE TIME FEEL FREE TO JUST SAY A LAW, THEOREM, OR CONCEPT FOR AN EXPLANATION AND I CAN GO AHEAD AND STUDY IT ON MY OWN. i REALLY DO READ THESE VERY CAREFULLY AND USE THE COMMENTS OFTEN, SO JUST A LITTLE HEADS UP. I FIND IT DIFFICULT...

  • You just passed the bar and are working at a small firm in Temple, TX when a new client leaves the following voice mail that was forwarded to you from the senior partners: "Sir, I am going to be s...

    You just passed the bar and are working at a small firm in Temple, TX when a new client leaves the following voice mail that was forwarded to you from the senior partners: "Sir, I am going to be starting a business and wanted to get some advice about getting started. I have heard these ads on the radio about Legal Zoom for incorporating a business and wonder whether I should do anything special, like incorporating, or have to do...

  • Need help with a C++ problem I have

    For my assignment I have to write a program that creates a restaurant billing system. I think that I got it correct but it says that it is incorrect, so I was wondering if someone would be able to look over my code. It says that I need tax of $0.20, and the amount due to be $4.10, which is what I have in my output.https://imgur.com/a/jgglmvWMy issue is that I have the correct outcome when I run my program but...

  • Explain Autonomy (Task, Time, Team, and Technique) that you have at your present job. What would...

    Explain Autonomy (Task, Time, Team, and Technique) that you have at your present job. What would you change? Based on book, Drive by Daniel Pink (but don't need book to answer Q) - Read below or explanation and definitions " Pink argues that for a person to be motivated completely, he or she must have autonomy over four things in his or her work environment: Task, Time, Technique, and Team. without reinventing the wheel, allow me to explain (you should...

  • I have an Assignment of Marketing Research on Climate Change. Where i have to take an...

    I have an Assignment of Marketing Research on Climate Change. Where i have to take an interview of an industry professional, which is done. I have all answer what he said, now i just need to analysis all answer into sub category which are as follow: - 1. Level of Concern of Professionals 2. Impacts on Industry 3. Awareness of Millennials' Knowledge 4. Attitudes Among Millennials If you think any other category could be included please add it. i have...

  • I would like help, I am unsure how to solve the following questions from my study...

    I would like help, I am unsure how to solve the following questions from my study guide. I am no confused because I have to use an excel sheet upon solving them 1. The current cost function for a lab that evaluates blood tests is C=400,000 + 20Q, where Q is the number of tests performed annually. If the lab expects to perform 50,000 tests annually, what are the average costs per evaluation? 2. A clinic finds that it can...

  • ne of my current job is working as Wheeled Mechanic for the National Guard. I have...

    ne of my current job is working as Wheeled Mechanic for the National Guard. I have never worked as a mechanic in my civilian life but I do like working on fixing things around the house. As a result, I decided to work as a mechanic for the Army National Guard.    The job characteristic model has five different area. In reference to my job as wheeled mechanic and applying the skill variety, which is the skills and activities needed...

  • write a memo that have an opening paragraph that briefly states the issues, a second paragraph...

    write a memo that have an opening paragraph that briefly states the issues, a second paragraph that gives detailed facts and a concluding paragraph that gives your recommendation. The memo should give the facts only and should not "blame" anyone. Avoid using adjective, pronouns, names, etc. Just state the facts. Address the memo to Dr. Hucks. The memo should be no longer than two pages. Ethical Issues "Josey," Jordan asked with a puzzled look on his face. "I don't understand...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT