International-Development-Linked-Data-Tools-
文件大小: unknow
源码售价: 5 个金币 积分规则     积分充值
资源说明:Conversion tools for working with IATI and Research for Development resources as linked data
# IATI and R4D Conversion Scripts

## Set up

These scripts have been developed and tested as root on a Turnkey Linux Core server image (Ubuntu 10.04 LTS). 

They should be cloned to /root/scripts/

### Dependencies

Pytassium, ckan, ckanclient, lxml, argparse

    easy_install pytassium
	easy_install ckan
	easy_install ckanclient
	easy_install lxml
	easy_install argparse

xsltproc

    apt-get install xsltproc

Jena

## IATI to RDF Conversion Scripts

These scripts fetch the latest IATI data from the IATI Registry and convert it to RDF before uploading to Kasabi platform. When re-run they check for changes to the data, and upload a changeset rather than original file, managing additions and removals. Changeset are archived. 

### Download and conversion process

Run the download script. This read through the IATI Registry, checking the checksum of files against those of files already downloaded. 

    python ~/scripts/IATI/download.py

To just download a particular donors files add **--groups groupname** as a parameter. E.g. 

    python download.py --groups dfid wb

To just fetch packages from DFID and the World Bank.

At this point you can either go directory by directory with a shell script that performs a straight XSLT->RDF->NTriples conversion, or use a set of python scripts that also look to generate changesets to synchronise data, including removing deleted data during the sync process.

#### Python route
Then change to ~/iati/data/packages/ and run

    python ~/scripts/IATI/convert_to_rdf.py

#### Shell script route
Change into each directory and run:

    ~/scripts/clprocess.sh

### Upload process

And then

    python ~/iati/scripts/upload.py --apikey=APIKEY --dataset=DATASET

Where APIKEY is your API Key from Kasabi.com and DATASET is the slug of the store you want to add data to. 

**If codelists have additions (see ToDo below):**

    python ~/scripts/IATI/codelists.py

the change to /iati/codelists and run

    python ~/scripts/IATI/iati_upload.py


## R4D scripts

These scripts convert the RDFXML RDF Dump from Research for Development to N3 and manage the creation of changesets to upload to Kasabi. 

### To use

Place the latest RDF dump into the ~/r4d/data/ folder with a .xml extension 

Run 

    python ~/scripts/R4D/process.py

This should convert the file and generate changesets. You can then run:

    python ~/scripts/upload.py --apikey=APIKEY --dataset=DATASET

From the ~/r4d/data/ directory where APIKEY is your API Key from Kasabi.com and DATASET is the slug of the store you want to add data to. 


# ToDo

* Updated the RDF model (possibly to convert direct to N3 so we can avoid conversion in the script to speed things up)
* Update so this does not rely on running as root
* IATI Codelist sync scripts do not currently hand for changes to codes / definitions. Need to implement proper revision control here. 
* Update the upload script to perform better logging, and to note when a .nt has not changed since last upload in order to not duplicate uploads
* Provide this all as a Turnkey Backup image anyone can grab a copy of (or similar)
* Work out how to schedule this to run weekly/monthly firing up a server and shutting it off when done

# Running from Turnkey Linux Backups

## Set up the server image:
* Either: Download the Turnkey Core image from: http://www.turnkeylinux.org/core and launch in your using your preferred VM software of choice (e.g. https://www.virtualbox.org/)
* Or: Launch to the EC2 Cloud using Turnkey Hub

(Note, if deploying using Virtual box you might need to set up port forwarding to allow you to SSH to it. See http://www.virtualbox.org/manual/ch06.html#natforward for details.)

Once connected for the first time:
* 

## Restore the latest backup

* SSH to the server (either on port 22, or via the web-based SSH console accessible on port 12320 - e.g. http://localhost:12320)
* Enter the username and password

## Run the updates

* SCP or wget the latest R4D data dump to /root/r4d/data and rename it as latest.rdf

E.g. if the server image is running through Turnkey Hub as r4d.tklapp.com
 
    scp r4dlastest.txt root@r4d.tklapp.com:/root/r4d/data/latest.rdf

Then

    cd /root/r4d/data/
    python ~/scripts/R4D/process.py
    python ~/scripts/upload.py --apikey=APIKEY --dataset=r4d-aid-data
    cd /

From the ~/r4d/data/ directory where APIKEY is your API Key from Kasabi.com and DATASET is the slug of the store you want to add data to. 

## Run the IATI Updates


本源码包内暂不包含可直接显示的源代码文件,请下载源码包。