pylist
文件大小: unknow
源码售价: 5 个金币 积分规则     积分充值
资源说明:Python playlist generator. Parses iTunes XML export, pulls artist+track data from last.fm for all music. Uses data to generate playlists.
#**Pylistfm**#

  The goal of pylistfm is to offer a means for the creation of playlists where
the probability of any given song or artist being added to a playlist can be
weighted with the data available from last.fm on the song or artist. These
playlists are meant to offer an alternative to hitting the 'randomize' button
on your music player of choice. In this era, music libraries rarely consist
solely of best-of songs. More commonly music libraries are built around
albums, or even entire discographies. The randomize button loses its value as
a useful playlist generation tool when, given an arbitrary but particular
artist, a user would be equally likely to listen to a song by that artist
which is, subjectively, worse than average, as the user would be to listen to
a song which is subjectively better than average. The issue worsens when for
the majority of artists in a user's library, the user would not even want to
listen to an average song by that artist. Pylistfm offers an alternative under
which the chance of a song being chosen by a particular artist is weighted by
its relative popularity according to global listening data pulled from the
last.fm web services API.

In addition, by being open-source, pylistfm offers even those with minimal
programming experience the ability to modify the song-picking algorithm in any
way they see fit. One can also export data about their music library or any
subset of their library to visualize or apply the data however the user
chooses. Two playlist-selection algorithms come bundled into the pylistfm
package, one of which based purely on track listening counts, and one based on
the ratio of listens to listeners.

The first algorithm, based on track listening counts, is best explained
through example. John, a pylistfm user, has 10,000 tracks in his library, with
100 of them being by, for this example, The Beatles. The probability of the
first song being added to the playlist being by The Beatles is 100 / 10000, or
1%. Let's say the random number generator hits that 1% chance, deciding that a
Beatles song will be the first on the playlist. In a traditional pure
randomization playlist, each song would have a 1/100 or 1% chance of being
chosen. However, pylistfm weights the probability of each song being chosen by
the total number of listens to that song from all last.fm users across the
globe. If the song Come Together has 1,000,000 listens, that song has a
1,000,000 / n chance of being chosen, where n is equal to the sum of all
listen counts of all of the Beatles songs he has in his library. If that sum
is 40,000,000 listens, he now has a 1/40 chance to hear Come Together. Once
the first song is chosen, that song is removed from the potential tracks and
the process is repeated, this time with the Beatles having a 99 / 9,999 chance
of being picked for the second song, or a .99% probability.

The second algorithm, based on the ratio of listens to listeners, effectively
weights track choice by the average number of times a song is listened to by
lastfm users. The idea behind this selection-process is that the better songs
in your library will have been listened to, by each last.fm listener of the
song, more times on average than songs which are not as enjoyable. Artist
choice is weighted by the average of the ratio values of the track objects in
the artist object's .tracks field.


##**Technical Documentation:**##

  * **lfmgather.py**

    * Primary definitions file. Run this in an interactive console to run
pylistfm.

    * Class Hybrid\_Track definition. Hybrid\_Track objects are what represent
processed songs, storing data in the following fields: track (track name),
artist\_name, album\_name, location, itunes\_id, track\_number, track\_count,
file\_duration, bit\_rate, sample\_rate, playcount, artist (stores a
pylast.Artist object reference), album (stores a pylast.Album object
reference), listener\_count (int), lfm\_playcount (int). Excluding the exceptions
noted above, each of these fields stores a unicode string assuming that data
exists for that field. If data does not exist for a field, it contains a
reference to a NoneType object.

    * get\_lfm\_info(itunes\_library) accepts a parameter containing a
pylistxml.Itunes_Library object. Converts each track and artist object in the
Itunes_Library object into Hybrid_Track and pylast.Artist objects,
respectively. Saves a file containing a list of pylast.Artist objects, with
each object containing all Hybrid_Track objects belonging to that Artist
object in the object's .tracks field. This file is saved as
'incompleteartists.db' Tracks which failed the conversion process are saved as
'failedtracks.db'

    * process_info(fname='incompleteartists.db',v=False) accepts the filename
of the saved list of Artist objects generated by get_lfm_info(). v can be set
to True if you want to see a line of text for each artist and track processed.
It fills in the Artist.sum_playcount field with the sum value of playcounts of
all Hybrid_Track objects in its tracks field, and it fills in the
Artist.playcount field with the total number of times a particular artist has
been played on last_fm by all users. In addition, each Hybrid_Track object has
its get_data() method called, filling in the object's listener_count and
lfm_playcount field with integer values representing the total number of
listeners and the total number of times listened, respectively. Once all of
these fields have been filled in for all track and artist objects,
calculate_ratios() is called, filling in a ratio field for every track and
artist object. The ratio for a track is a float equal to lfm_playcount /
listener_count. The ratio for an artist object is the average ratio value for
the track objects in its tracks field. Artist objects also have their
.trackcount field filled in with the integer value of the length of their
tracks field. Tracks and artists without valid ratio data are then removed. To
avoid a track having invalid data, make sure that the track's ID3 tags are
correct.

    * make_progress(): The function which advances the stage of data parsing
and processing. This function will direct you from having just downloaded the
application to having a ready dataset of artist and track objects to apply
playlist generation algorithms to. It will check what files exist in the
current directory to gauge your progress towards a ready-to-use dataset. The
first stage requires you to perform an iTunes library export. The function
will prompt you to do so and provide instructions to do so if you have not
done this. The next stage is to call pylistxml.parse_itunes('Library.xml')
which saves an Itunes_Library object as 'itunes.db'. The third stage calls
get_lfm_info(datamgmt.load('itunes.db')) which performs the processes
described above, then saves a file 'incompleteartists.db'. The final stage is
to call process_info() which is also described above, resulting in a
'artists.db' file being generated.

  * **datamgmt.py**

    * save(data,filename) performs a pickle dump of the data into a file of
name filename using protocol 2.

    * load(filename) returns the unpickled contents of a file.

    * make_m3u(songs) accepts a list of Hybrid_Track objects and creates an
m3u8 (m3u8 is a m3u file encoded in utf-8) playlist in the current directory
titled 'playlistnew.m3u'. Works for Windows.

    * make_m3u_osx(songs) does the same thing as make_m3u but formats the file
locations in a way that works with OSX.

  * **pylistxml.py**

    * Contains functions used to parse an itunes xml library export and store
that data in an Itunes_Library object.

  * **filters.py**

    * make_playlist(artists,songcount) accepts a list of artists and the
number of songs you want in your playlist. Makes choices in the way described
above as Algorithm 1. Returns a list of songs of songcount length.

    * make_playlist2(artists,songcount) Input and output identical to
make_playlist. Uses the selection algorithm described above as Algorithm 2.


**Future Development:**

I would like to re-build the user interface for the application, allowing all actions currently possible from
the command-line to be possible through an intuitive GUI. I would also like to
look into using a SQL database to store all data used by the application.
Currently it only uses a SQL database to store cached results of API calls to
last.fm. I've found cPickle to be much faster at saving and loading large
numbers of entries than the sqlite3 module. I would like to create a pandora
/genius-style playlist generation algorithm which will factor similarity of
tracks to the previous track into the song choice decision-making. I would
also like to find a more efficient way to determine when new tracks and
artists are added to iTunes and then process those objects, adding them to the
'artists.db' database. It would be nice to find a way to integrate directly to
iTunes, without the need for the intermediate step of a library xml export.
I would also like to implement song bit rates into
account when parsing the iTunes xml export. Currently, it accepts the first
item encountered of a particular song name and particular track name, ignoring
any duplicate tracks, potentially ones with higher audio quality.


**Problem areas:**

The difference between unicode strings and ascii strings was a huge problem
area throughout development. However, I'm fairly certain that at this point
all bugs relating to unicode strings with characters that do not exist in
ascii have been resolved. I ran into issues with pickle where under after
performing a dump using protocol 0 or 1, files became unloadable. This has
something to do with unicode, but I'm really unsure of exactly what the issue
was. I ran into issues with pickling objects which stored references to the
cached results of their function_calls using a Connection object referencing
the sqlite database. By disabling caching immediately before saving, the
Network field containing the Connection object is assigned None as a value,
allowing for pickle to save the objects.

Directions for use:

  * Perform an iTunes library export (File -> Library -> Export Library from within iTunes)
and put the Library.xml file in the same directory as lfmgather.py.

  * Run lfmgather.py interactively in your python console of choice. 

  * Once you've loaded lfmgather.py, execute the following command: **makeprogress()**. 

  * You can then watch as pylistfm performs the data
gathering necessary and advances from **Library.xml** to **itunes.db** to
**incompleteartists.db**, which is finally converted into **artists.db**.

This whole process may take several hours, depending on the size of your library.
Once lfmgather has created an artists.db file, you have all the necessary data stored clientside to perform
playlist generation. At that point, any time you run lfmgather.py, you can
execute the commands: 

  * **artists = datamgmt.load('artists.db')**
  * **songs = filters.make\_playlist(artists, n)** or **songs =
filters.make\_playlist2(artists, n)** where n is equal to the number of songs
you want on the generated playlist. 
  * **datamgmt.make\_m3u\_osx(songs)** or **datamgmt.make_m3u(songs)** (osx or windows, respectively)
  
which will result in a m3u8 playlist of those songs being saved as playlistnew.m3u in the current
working directory. Once you've called a make_playlist function, the tracks
which are chosen are removed from artists, so you'll have to reload artists.db
afterwards if you want songs from the first playlist to potentially end up in
any future playlists.


Please let me know if you run into any issues or bugs or suggestions when
using this program, it's still very much a work in progress and there are
quite a few things I'd like to add or fix at this point. However, the primary
documented features *should* be fully functional at this point, and most
errors have try/except statements in the code that attempt to work around
situations where I have encountered errors so far, as I work toward figuring
out more specifically what causes some of the errors and attempt to correct
them before they happen.


本源码包内暂不包含可直接显示的源代码文件,请下载源码包。