# metadata **Repository Path**: lainyu/metadata ## Basic Information - **Project Name**: metadata - **Description**: It's a project aim to resolve the identification of dataset. In my group, people used to train data, but somehow no tools to help them to identify the origin of training result - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-08-24 - **Last Updated**: 2021-08-24 ## Categories & Tags **Categories**: Uncategorized **Tags**: toolkit ## README `v1 2021-08-19 Creator:Yuxin init` `v1 2021-08-19 Creator:Yuxin update attributes` # Metadata说明 metadata根据需求被定义为,描述整个数据集一些主要特征的文件。每个原始数据集,训练集应该有一个metadata来描述当前文件夹下的内容。 *PS:去年在国家天文台时他们使用了metadata来管理天文学上的数据。 把数据上传到云端,上传时会把数据集的metadata也一并传上, 云端的储存时会读取metadata,根据metadata描述的结构生成对应结构的文件夹树,保存数据。下图是国台metadata用的模板。* ![](metadata.png) ## metadata设计条件 1. metadata应该是常见的配置类数据格式比如yaml, xml或者json。 2. metadata应该尽量限制大小在500k以下,超过会影响读取速率。所以每一个特征的设计必须能表达足够的信息量。 3. metadata具备一定的拓展性,可以让用户加入自己的key:value ## metadata组成 以下是初步设计: 1. 每一项以'\*'开头表示 **必填**, 以':'表示**可以自动生成** 2. 接着是英文/中文 特征, 最后是数据格式。 - ‘’表示字符串, 日期类型必须以%Y-%m-%d %H:%M:%S的形式填写 - {}表示字典 - \[\]表示列表格式。 ``` - *[name] 数据集名称 '' - *[creator] 作者或者机构 '' - :[createDate] 生成日期 '%Y-%m-%d %H:%M:%S' - *[Instruments] 采集设备型号 '' - :[location] 存放点 - Sets 数据集 {} - *[type] 类型为图片,数组等 可自由设定 '' - :[format] 根据数据类型自动生成,有三种 源数据RAW_DATA, 训练集TRAIN_DATA, 测试集TEST_DATA '' - :[origin] 来源,根据数据类型如训练集和测试集自动生成 '' - [comments] 说明 '' - :[history] 更改历史 [{}] - [updated] 更改时间 '%Y-%m-%d %H:%M:%S' - [creator] 更改人 '' - [changes] 更改点 '' - Comments 说明 '' ``` ## 使用说明 ### 单机状态 ``` --Usage: python3 metadata.py [update/show] path_to_metadata --Usage: python3 metadata.py [create] [raw/train/test] path_to_metadata --Usage: or python3 metadata.py [copy] [-f] path_source_metadata path_source_destination --For example: python3 metadata.py create raw ~/datasets/1/ 生成源数据metadata引导 --For example: python3 metadata.py create train ~/datasets/1/train/ 生成训练集metadata引导 --For example: python3 metadata.py copy ~/datasets/1/, ~/datasets/2/ 从~/datasets/1/复制一份metadata到~/datasets/2/ --For example: python3 metadata.py copy -f ~/datasets/1/, ~/datasets/2/ 从~/datasets/1/ 跳过引导步骤强制复制一份metadata到~/datasets/2/ ``` #### 生成源数据metadata 单机情况下,给vehicle/1文件夹中的**源数据**生成metadata:`create raw ./vehicle/1` ``` * Enter dataset name: >? ROUTE SCARS DETECTOR Enter creator name(Default value: hp) >? yuxin Please enter create date(Default current datetime, format: %Y-%m-%d %H:%M:%S):>? Enter instrument name of data collector: >? radar with 8 pipes * Enter dataset format(img/array/binary/etc..):>? binary Some comments? >? running New attributes to metadata(just enter json format string, 'q' to quit this step)?>? {"MyAttribute": [232123, 324555]} New attributes: {'MyAttribute': [232123, 324555]} New attributes to metadata(just enter json format string, 'q' to quit this step)? ``` 完成后会现实完整的metadata数据并现实保存地址: ```json { "name": "ROUTE SCARS DETECTOR", "creator": "yuxin", "createDate": "2021-08-23 15:53:37", "instruments": "radar with 8 pipes", "location": "./vehicle/1/", "sets": { "format": "binary", "type": "RAW_DATA", "origin": "", "comments": "", "history": [ { "updated": "2021-08-23 15:53:37", "creator": "yuxin", "changes": "Initialize metadata." } ] }, "comments": "running", "MyAttribute": [ 232123, 324555 ] } Metadata created and saved at ./vehicle/1/metadata.json ============================================================================================== Bye ``` #### 修改源数据metadata 修改metadata使用命令: `update ./vehicle/1/`, 程序会先展示当前源数据内容,接着进入修改引导 ``` { "name": "ROUTE SCARS DETECTOR", "creator": "yuxin", "createDate": "2021-08-23 15:53:37", "instruments": "radar with 8 pipes", "location": "./vehicle/1/", "sets": { "format": "binary", "type": "RAW_DATA", "origin": "", "comments": "", "history": [ { "updated": "2021-08-23 15:53:37", "creator": "yuxin", "changes": "Initialize metadata." } ] }, "comments": "running", "MyAttribute": [ 232123, 324555 ] } Usage: history / attributes Usage: creator/comments/name.. etc Usage: sets type/origin/comments Enter the key you want to update(q to quit): >? history Enter creator name(Default value: hp) >? * Enter changes: >? UPDATE xxx parameters Enter the key you want to update(q to quit): >? comments Enter new value: >? it's sunny day Enter the key you want to update(q to quit): >? sets type * Enter dataset format(img/array/binary/etc..):>? img Enter the key you want to update(q to quit): >? q ============================================================================================= { "name": "ROUTE SCARS DETECTOR", "creator": "yuxin", "createDate": "2021-08-23 15:53:37", "instruments": "radar with 8 pipes", "location": "./vehicle/1/", "sets": { "format": "binary", "type": "img", "origin": "", "comments": "", "history": [ { "updated": "2021-08-23 15:53:37", "creator": "yuxin", "changes": "Initialize metadata." }, { "updated": "2021-08-23 15:58:46", "creator": "hp", "changes": "UPDATE xxx parameters" } ] }, "comments": "it's sunny day", "MyAttribute": [ 232123, 324555 ] } Metadata updated and saved at ./vehicle/1/metadata.json ============================================================================================== Bye ``` #### 生成训练集metadata 训练该数据集时,需要先读取该数据集的metadata,继承一部分特征,比如,设备型号等, 并把源数据location属性设置为sets->origin(存放点)属性并生成该文件保存在训练集存放的路径下。 现在我们在./vehicle/1/train文件夹下生成以./vehicle/1的源数据为底的metadata: `create train ./vehicle/1/train` ``` Please indicate the path of matched raw data: >? ./vehicle/1/ * Enter dataset name: >? ROUTE SCARS DETECTOR TRAINED RESULT Enter creator name(Default value: hp) >? Please enter create date(Default current datetime, format: %Y-%m-%d %H:%M:%S):>? * Enter dataset format(img/array/binary/etc..):>? array Some comments? >? no New attributes to metadata(just enter json format string, 'q' to quit this step)?>? {"nParam": "2c", "mParam": [3, 9, -1, 0]} New attributes: {'nParam': '2c', 'mParam': [3, 9, -1, 0]} New attributes to metadata(just enter json format string, 'q' to quit this step)? ============================================================================================= { "name": "ROUTE SCARS DETECTOR TRAINED RESULT", "creator": "hp", "createDate": "2021-08-23 16:11:59", "instruments": "radar", "location": "./vehicle/1/train/", "sets": { "format": "array", "type": "TRAIN_DATA", "origin": "./vehicle/1/", "comments": "", "history": [ { "updated": "2021-08-23 16:11:59", "creator": "hp", "changes": "Initialize metadata." } ] }, "comments": "no", "nParam": "2c", "mParam": [ 3, 9, -1, 0 ] } Metadata created and saved at ./vehicle/1/train/metadata.json ============================================================================================== Bye Process finished with exit code 0 ``` #### 复制一份metadata 如果不想经历多次create metadata的引导过程可以使用copy命令快速生成metadata: `copy ./vehicle/1 ./vehicle2`, 此命令只需要填写creator即可,如果不想经过引导生成, 可以使用-f强制复制 `copy -f ./vehicle/1 ./vehicle/2`即可。 ``` Enter creator name(Default value: hp) >? yuxin ============================================================================================= { "name": "ROUTE SCARS DETECTOR", "creator": "hp", "createDate": "2021-08-23 16:28:10", "instruments": "radar", "location": "./vehicle/2/", "sets": { "format": "binary", "type": "RAW_DATA", "origin": "", "comments": "OK", "history": [ { "updated": "2021-08-23 16:06:47", "creator": "hp", "changes": "Initialize metadata." }, { "updated": "2021-08-23 16:08:09", "creator": "yuxin", "changes": "update xx parameters" }, { "updated": "2021-08-23 16:28:15", "creator": "yuxin", "changes": "COPY FROM ./vehicle/2/metadata.json" } ] }, "comments": "no", "myAttributes": 12343 } Metadata created and saved at ./vehicle/2/metadata.json ============================================================================================== Bye ``` ### 云模式下 如果有朝一日我们有云服务器了,可以通过url上传数据集,当然服务器会检测上传的数据集是否包含合法metadata,如果没有就拒绝接受此次数据。用户也可以远程查看当前有哪些数据集,以及某个数据集的详细内容。