katz
文件大小: unknow
源码售价: 5 个金币 积分规则     积分充值
资源说明:Katz-Lazarsfeld vs Watts-Dodds
FunData Twitter Shootout -- OCaml
=================================

This is the OCaml version of the [http://github.com/alexy/fundata1](Functional Data Shootout #1), implementing the Karmic Social Capital (KSC) on Twitter Communication Graph.  The _fundata1_ repository contains all the necessary documentation about the shootout and data itself.  Here we'll provide some implementation notes specific to [clams -- the OCaml implementation of fundata1](http://github.com/alexy/clams).


Building
--------

	make
	
The main executable is `sc.opt`, natively compiled.
	
I tried to learn `omake` here, and it builds the executable, but that then fails to process command line arguments properly.  I'm probably missing something.

Running
-------

You have to set `OCAMLRUNPARAM` or it will take forever.  Smaller values may work:

	export OCAMLRUNPARAM='h=5G;s=1G'

	time ./sc.opt    [maxDays]
	
It reads the graph from __ and , which can be either `.json.hdb` or `.mlb` files.  The latter ones are the results of `save_graph.opt`, reading in the former and saving in OCaml's own binary format via `Marshal.to_cahnnel`.  The result is uncompressed.
	
	
Data Preparation
----------------

If you want to create a smaller version of the original graphs, you can use the `save_graph` program.  It's invoked as follows:

	./save_graph.opt   [N] [progress]
	
If N is given, it will limit the number of entries read from the original -- only those will be used to build the OCaml graph and dump that into `.mlb`.  If progress is given, a dot will be printed after so many entries; the default is 10,000.


Mostly Functional
-----------------

This is a translation of [http://github.com/alexy/husky](husky), a Haskell implementation of KSC, into OCaml.  At first, _husky_ was crashing under _ghc 6.12.1_ which had a GC bug.  While Simon Marlow valiantly fought and quashed it, I translated _husky_ to _clams_.  

Some things stand out in OCaml, e.g.

-- floating-point operators are distinct from integer ones, and contain a dot: `+. *. /.`  

-- Equality is tested with `=` or `==` -- but `== 0.` fails for `float`s.

-- `safeDivide x y` doesn't check for `y = 0.`, but rather goes ahead with the division and then checks whether the result is `nan` or `infinite`.

Just a Little Bit Imperative
----------------------------

OCaml's standard _map_ data structure is an imperative `Hashtbl`.  At creation time, it wants to know how much to allocate; we just go ahead and allocate it all for the number of users, i.e. 

	let orderN = 5000000 in
	Hashtbl.create orderN;
	
Hence instead of folds, some operations are `List.iter`, and all those updating a `Hashtbl` have type `unit` and value `()`.  This can present a moment's pause when translating back for a purely functional map.

Better Maps
-----------

Mauricio Fernandez and Mattias Giaovannini developed [http://eigenclass.org/R2/writings/finite-map-benchmarks](faster maps for OCaml).  Some are functional, like `ternary`, some are imperative, like `Fasthashtbl`.  The latter works best when there's no deletions, which is the case.


本源码包内暂不包含可直接显示的源代码文件,请下载源码包。