OrganiK project: working on testdata collection
As blogged in January, Gunnar, Remzi and I are working for DFKI on the Organik-Project. As true hard bloggin' scientists, we keep on reporting.
In the next two weeks, I will gather an exhaustive test-data collection of texts that we use for ontology learning. I hope to gather around 10.000 documents from various sources that have a topic overlap. We need e-mails, office documents (contracts, etc) and news documents. There are a lot of test data sets out there, the question is now to pick the right one. Also, in OrganiK we have SME partners who could provide some data.
After this, the next step will be to create a taxonomy learning module that analyses the documents and semi-automatically (or fully automatically) creates a taxonomy out of it for future classification. If its fully automatic, I expect that the taxonomy will have probabilistic elements in it ("it thinks that this is a customer, but only 60%"). If we work with a probabilistic model throughout the whole project, we can rank everything all the time, maybe this will reduce human work. We will see.
Anyone has experience with taxonomies that have a weight added? Its similar to a TF/IDF rank.
In the next two weeks, I will gather an exhaustive test-data collection of texts that we use for ontology learning. I hope to gather around 10.000 documents from various sources that have a topic overlap. We need e-mails, office documents (contracts, etc) and news documents. There are a lot of test data sets out there, the question is now to pick the right one. Also, in OrganiK we have SME partners who could provide some data.
After this, the next step will be to create a taxonomy learning module that analyses the documents and semi-automatically (or fully automatically) creates a taxonomy out of it for future classification. If its fully automatic, I expect that the taxonomy will have probabilistic elements in it ("it thinks that this is a customer, but only 60%"). If we work with a probabilistic model throughout the whole project, we can rank everything all the time, maybe this will reduce human work. We will see.
Anyone has experience with taxonomies that have a weight added? Its similar to a TF/IDF rank.
|
leobard - 21. Apr, 13:37
|
|
elodygaben - 21. Sep, 17:24
If we work with a probabilistic model throughout the whole project, we can rank everything all the time, maybe this will reduce human work. We will see.
adobe dreamweaver
macromedia fireworks
macromedia 8
microsoft office 2007
windows 7 download
hebrew dictionary
rosetta stone
microsoft office
adobe dreamweaver
macromedia fireworks
macromedia 8
microsoft office 2007
windows 7 download
hebrew dictionary
rosetta stone
microsoft office
san diego real estate - 11. Nov, 18:12
If we work with a probabilistic model throughout the whole project, we can rank everything all the time, maybe this will reduce human work.
Regards,
Aden Jeff - san diego real estate
Regards,
Aden Jeff - san diego real estate
- add comment - 0 trackbacks






viagra discount
canada viagra
viagra
keppra
xeloda
medspharmacysupport
hmm
Diploma |
Homeschool Online
viagra professional
the canadian pharmacy
viagra super active
cialis super active