TWiki
>
IVOA Web
>
TWikiUsers
>
DaveMorris
>
VoluteTransfer20150730
(revision 4) (raw view)
Edit
Attach
---+ Volute transfer Options for transferring [[https://volute.googlecode.com][Volute]] from [[https://code.google.com/][GoogleCode]] to [[https://github.com/][GitHub]]. ---+ Headline figures, based on disc usage ---+++ volute-complete - 825M Svn checkout of everything in the repository. <verbatim> svn checkout https://volute.googlecode.com/svn/trunk/ volute-complete du -h volute-complete > complete-original.txt </verbatim> ---+++ volute-noextern - 764M Svn checkout, without resolving the extern references. <verbatim> svn checkout --ignore-externals https://volute.googlecode.com/svn/trunk/ volute-noextern du -h volute-noextern > noextern-original.txt </verbatim> ---+++ volute-export - 391M Svn export, a snapshot of the current state with no commit history. <verbatim> svn export --ignore-externals https://volute.googlecode.com/svn/trunk/ volute-export du -h volute-export > export-original.txt </verbatim> Of the 391M in the exported snapshot, the top 8 projects are : * theory 220M * dm 126M * registry 26M * grid 6M * vocabularies 3M * samp 3M * votable 2M * ivoapub 2M ---+ Maximal transfer If we just press the 'export to !GitHub' button, then everything will get transferred, including the commit history. I have seen this work on a *small* project, and everything just worked. On a large project like ours the process will probably take a while. With a total size of 825M we are close to the !GitHub 1Gbyte per repository limit, which may cause problems later on. The only unusual thing to watch for is that the email telling you the process has completed will be sent to the email address linked to your !GitHub account, not to your Google account. See: [[https://code.google.com/export-to-github/][GitHubExporter]] ---+ IVOA organization If we want our !GitHub repository to be owned by the [[https://github.com/ivoa][IVOA organization]] in !GitHub, do the transfer to your private account, and then transfer the repository afterwards. See: [[https://code.google.com/p/support-tools/wiki/GitHubExporterFAQ#Can_I_migrate_a_Google_Code_repository_to_a_GitHub_Organization?][Migrate to an Organization]] ---+ Commit history It is important to note that the automated 'export to !GitHub' tool is the only realistic way to preserve the commit history of the svn reposiroty. All of the alternative suggestions outlined below rely on a manual process of exporting the contents to a local copy and then importing some or all of it into one or more !GitHub repositories. If we decide to go for one of these alternatives then it is not practical to try to preserve the svn commit history. ---+ Snapshot transfer If we skip the svn history and just take a snapshot of where we are now, then we have less than 400M to transfer. We would have to do the transfer manually, exporting a local copy from svn, and then importing it into a new !GitHub repository. <verbatim> git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY local-repo svn export --ignore-externals https://volute.googlecode.com/svn/trunk/ local-repo pushd local-repo git add . git commit -m 'Initial import from svn' git push popd </verbatim> ---+ Space limits !GitHub don't have hard and fast limits on the size of a repository. <verbatim> We *recommend* repositories be kept under 1GB each. This limit is easy to stay within if large files are kept out of the repository. If your repository exceeds 1GB, you might receive a polite email from GitHub Support requesting that you reduce the size of the repository to bring it back down. </verbatim>(emphasis mine) See: [[https://help.github.com/articles/what-is-my-disk-quota/][What is my disk quota ?]] I contacted !GitHub to see if there would be an issue with us using more than 1Gbyte of space. I got the following reply from a member of their help team : <verbatim> Hi Dave, Thanks for reaching out! We strongly recommend keeping repositories under 1GB in size. Additionally, to ensure that repository performance is optimal, only files less than 100MB in size can be pushed to GitHub.com. More information about this can be found here: https://help.github.com/articles/what-is-my-disk-quota The good news is that in order to make working with large files better, we recently published an extension to Git called Git Large File Storage, and support for Git LFS is currently in early access on GitHub.com. You can check it out at http://git-lfs.github.com and sign up for early access at https://github.com/early_access/large_file_storage I hope this information helps, please let us know if you have any questions! Cheers, Rachel </verbatim> ---+ Large files I suspect that due to the way that we use volute, the [[https://git-lfs.github.com/][Large File Storage]] extension will be of limited value to us. In the current version of the Git LFS extension you can't select which files should be stored separately based on file size. The file selection criteria is based purely on file path and type. A number of people have been asking for selection by size, but it does not look like it will be available soon. * [[https://github.com/github/git-lfs/issues/125][Add files to git-media by attribute such as size?]] * [[https://github.com/github/git-lfs/issues/282][Support tracking by file size]] This means that in order for it to be useful in reducing the size of our repository, we would need to identify which files we wanted to be handles using the LFS extension *before* they were added to the repositiory. In reality, some of our uses would be extremely careful about making sure every =pdf= and =doc= file in their project was listed, even the ones that were less than 1Mbyte. Other users would just want to be able to commit and push a whole directory tree and leave it up to the software to sort out which files need to be handled differently. !GitHub has a maximum file size limit of 100M per file. The LFS extension was designed to enable Git to handle things like binary image files, e.g. =jpeg=, =png=, =svg=. Using the file path and type to identify which files should be treated differently. Looking the files in the current volute repository, we have a wide variety of different file types and sizes, and it would be difficlut to define a reliable selection criteria to identify which files should be handled by LFS. * We have no files larger than 100M bytes. * We have no files larger than 50M bytes. * We have four files larger than 10M bytes, all of them in the theory project. * projects/theory/snap/simtap/PDR143/PDR143-2.vo-urp * projects/theory/snap/simtap/PDR143/html/PDR143-2.html * projects/theory/snap/simtap/PDR143/tap/PDR143-2_tap_tableset.xml * projects/theory/snap/simtap/PDR143/tap/PDR143-2_votable.xml * We have a few files larger than 5M bytes, most of them in the theory project. * projects/dm/vo-dml/libs/eclipselink.jar * projects/theory/snap/simtap/PDR143/PDR143-2.vo-urp * projects/theory/snap/simtap/PDR143/html/PDR143-2.html * projects/theory/snap/simtap/PDR143/tap/PDR143-2_tap_tableset.xml * projects/theory/snap/simtap/PDR143/tap/postgres/PDR143-2_create_tap_schema.sql * projects/theory/snap/simtap/PDR143/tap/mssqlserver/PDR143-2_create_tap_schema.sql * projects/theory/snap/simtap/PDR143/tap/PDR143-2_votable.xml * projects/theory/snap/simtap/PDR143/tap/PDR143-2_vodataservice.xml * projects/theory/snapdm/input/other/sourceDM/IVOACatalogueDataModel.pdf * We have 70 files larger than 1M bytes. * Everything else is smaller than 1M byte. Note that many of our largest files are 10Mbyte+ =html= and =xml= files, presumably generated by our modelling tools. Equally, some of our smallest files are =html= and =xml= files, and we would not want any of the =html= and =xml= source files for our standards documents to be stored externally as binary files. ---+ Project types Looking at the current contents volute, we have four different project types. ---++ Theory projects It looks like the three theory projects contain a relativley small number of human edited source files, and the majority of the space is taken up by machine generated files. * projects/theory - 220M * snap - 108M * snapdm - 109M * simdal - 3.3M There is a good case for exporting each of the three theory projects as separate !GitHub repositories. Even without using the LFS extension to manage the larger files, these projects would all be under the recomended 1Gbyte per repository limit. ---++ Data models Four of the data model projects are directly related to the standard documents defining the corresponding data model. * projects/dm * CubeDM-1.0 - 8.1M * ImageDM - 3.6M * provenance - 5.8M * SpectralDM-2.0 - 7.3M The majority of the space is taken up by a mixture of medium sized (1M < s < 10M) =doc=, =pdf= and =png= files. The fith data model project is for the VO Data Modelling Language, VO-DML. This project accounts for over 100M of the 126M of space used by the data model projects, and is the third largest project in the volute repository. * projects/dm - 126M * .... * vo-dml - 101M Again, the majority of the space is taken up by a mixture of medium sized (1M < s < 10M) =doc=, =pdf= and =png= files. Although this project is related to the VO-DML and UTYPE specifications, there is a case for exporting it as a separate separate !GitHub repository. In addition to the documents for the VO-DML and UTYPE specifications the vo-dml project also contains definitions of the models themselves along with the source code for the tools for validating the models and for building derived data products from them. ---++ VOSpace service We have one project that contains code for a program, donated by Rick Wagner at UC San Diego. * projects/grid/vospace/php_endpoint * size : 1.5M * type : PHP web service * lang : php <verbatim> = PHP VOSpace Endpoint = VOSpace endpoint building on top of the [http://www.irods.org iRODS] client, Prods. Requires Prods, which is part of the iRODS distributions (under clients). Also uses [http://simpletest.sf.net SimpleTest] for unit tests. Configure the locations in config.inc. Rick Wagner http://lca.ucsd.edu/projects/rpwagner rwagner@physics.ucsd.edu </verbatim> As a self-contained source code project there is a case good case for exporting this project as separate !GitHub repository of its own. ---++ Vocabularies The vocabularies project contains the build tree for the IVOA vocabulary SKOS files. Although this project is relatively small, 3.4M, it is not directly related to an IVOA document or standard. As a self-contained source code project there is a case good case for exporting this project as separate !GitHub repository of its own. ---++ Documents and standards Everything else in our repository are either source text for our documents or tools for creating documents. If we wanted to we could use the LFS extension to process all of the =doc=, =pdf= and =jpeg= files separately. However, this would not reduce the size of a local clone of the repository, nor the time it would take to download it. ---+ Proposed structure If we take a copy of the exported snapshot and split out the projects identified above as candidates for separate !GitHub repositories. <verbatim> mkdir github-repos pushd github-repos cp -r ../volute-export local-temp mv local-temp/projects/theory ivoa-theory mv local-temp/projects/dm/vo-dml ivoa-dml mv local-temp/projects/vocabularies ivoa-vocabularies mv local-temp/projects/grid/vospace/php_endpoint php-vospace mv local-temp/projects ivoa-documents popd du -h github-repos > github-repos.txt </verbatim> The we get the following set of candidate !GitHub repositories: * github-repos - 391M * php-vospace - 1.5M * ivoa-vocabularies - 3.4M * ivoa-documents - 66M * ivoa-dml - 101M * ivoa-theory - 220M If we split the three theory projects into separate !GitHub repositories, then we get the following: * github-repos - 391M * php-vospace - 1.5M * ivoa-vocabularies - 3.4M * ivoa-documents - 66M * ivoa-dml - 101M * ivoa-snap - 108M * ivoa-snapdm - 109M * ivoa-simdal - 3.3M ---+ Historical documents It would be possible to further reduce the size of the ivoa-documents !GitHub repository by excluding the historical versions of the documents stored in the current repository. ---+ References * [[http://google-opensource.blogspot.co.uk/2015/03/farewell-to-google-code.html][Farewell to google code]] * [[https://code.google.com/export-to-github/][GitHubExporter]] * [[https://code.google.com/p/support-tools/wiki/GitHubExporterFAQ][GitHubExporter FAQ]] * [[https://code.google.com/p/support/wiki/ReadOnlyTransition][ReadOnly transition]] * [[https://git-lfs.github.com/][Git Large File Storage]] * [[https://github.com/github/git-lfs/blob/master/docs/spec.md][Git LFS Specification]]
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r7
<
r6
<
r5
<
r4
<
r3
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r4 - 2015-07-31
-
DaveMorris
IVOA
Log in
or
Register
IVOA.net
Wiki Home
WebChanges
WebTopicList
WebStatistics
Twiki Meta & Help
IVOA
Know
Main
Sandbox
TWiki
TWiki intro
TWiki tutorial
User registration
Notify me
Working Groups
Applications
Data Access Layer
Data Model
Distributed Services & Protocols
Registry
Semantics
Interest Groups
Data Curation
Education
Knowledge Discovery
High Energy
Operations
Radio Astronomy
Solar System
Time Domain
Committees
Stds&Procs
www.ivoa.net
Documents
Events
Members
XML Schema
Copyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback