java disc based hashmap -


i'm working on web crawler (please don't suggest existing one, not option). have working way expected to. issue i'm using sort of server/client model server crawling , processes data, put in central location.

this location object create class wrote. internally class maintains hashmap defined hashmap<string, hashmap<string, string>>

i store data in map making url key (i keep these unique) , hasmap value stores corresponding data fields url such title,value etc

i serialize internal objects used spider multi threaded , have 5 threads crawling memory requirements go exponentially.

to far performance has been excellent hashmap, crawling 15k urls in 2.r minutes 30 seconds cpu time don't need pointed in direction of existing spider forum users have suggested.

can suggest a fast disc based solution support concurrent reading & writing? data structure doesnt have same, needs able store related meta tag values etc.

thanks in advance

i suggest using ehcache this, though you're building isn't cache. ehcache allows configure cache instance overflows disc storage, while keeping recent items in memory. can configured disc-persistent, i.e. data flushed disc on shutdown, , read memory @ startup. on top of that, it's key-value based, fits model. supports concurrent access, , since disk storage managed separate thread, shouldn't need worry disk access concurrency.

alternatively, consider proper embedded database such hypersonic (or numerous others of similar style), that's going more work.


Comments

Popular posts from this blog

c++ - How do I get a multi line tooltip in MFC -

asp.net - In javascript how to find the height and width -

c# - DataTable to EnumerableRowCollection -