Galaxy has been suggested internally: Why not start with open source & build what's missing, instead of building all from scratch? Galaxy has been tried: see Refinery project at HMS/Park Lab not perfect, either: lots of custom coding was needed had to go outside galaxy to handle data loads no workflow restarting from partially complete state only restart from beginning no workflows of workflows no job avoidance: huge for TCGA ... data change each month, but only in drips & drabs significant manual interaction required for large #s of files (100s of files GBs in size) BUT we have 1000s of files, TBs in size freezing histories sample-based or file-based? scaling: 150K jobs in Galaxy 1 month (June 2012) 250K jobs in FH 1 week + >= 100K or more avoided >= ~350K jobs per week How many million jobs have been run through synapse in 1 month? Has it been tested on all diseases, all datatypes, all analyses for all AWGs?