[>>]
Das Weblog lesezeichen wurde deaktiviert. Falls du die notwendigen Rechte besitzt, so könntest du es hier wieder reaktivieren.

Tuesday, 21. April 2009

OrganiK project: working on testdata collection

As blogged in January, Gunnar, Remzi and I are working for DFKI on the Organik-Project. As true hard bloggin' scientists, we keep on reporting.

In the next two weeks, I will gather an exhaustive test-data collection of texts that we use for ontology learning. I hope to gather around 10.000 documents from various sources that have a topic overlap. We need e-mails, office documents (contracts, etc) and news documents. There are a lot of test data sets out there, the question is now to pick the right one. Also, in OrganiK we have SME partners who could provide some data.

After this, the next step will be to create a taxonomy learning module that analyses the documents and semi-automatically (or fully automatically) creates a taxonomy out of it for future classification. If its fully automatic, I expect that the taxonomy will have probabilistic elements in it ("it thinks that this is a customer, but only 60%"). If we work with a probabilistic model throughout the whole project, we can rank everything all the time, maybe this will reduce human work. We will see.
Anyone has experience with taxonomies that have a weight added? Its similar to a TF/IDF rank.
QR barcode by i-nigma.com/CreateBarcodes
mattewberry - 13. Jul, 16:33

After this, the next step will be to create a taxonomy learning module that analyses the documents and semi-automatically (or fully automatically) creates a taxonomy out of it for future classification.
viagra discount
canada viagra
viagra
keppra
xeloda
medspharmacysupport

james23 - 25. Sep, 09:15

hmm

create a taxonomy learning module that analyses the documents and semi-automatically interesting High School - Online Diploma | GED
james23 - 25. Sep, 09:16

that analyses the documents and semi-automatically interesting
Diploma |
Homeschool Online
kijemki - 12. Nov, 11:12

After this, the next step will be to create a taxonomy learning module that analyses the documents and semi-automatically (or fully automatically) creates a taxonomy out of it for future classification. If its fully automatic, I expect that the taxonomy will have probabilis
viagra professional
the canadian pharmacy
viagra super active
cialis super active
elodygaben - 21. Sep, 17:24

If we work with a probabilistic model throughout the whole project, we can rank everything all the time, maybe this will reduce human work. We will see.
adobe dreamweaver
macromedia fireworks
macromedia 8
microsoft office 2007
windows 7 download
hebrew dictionary
rosetta stone
microsoft office

san diego real estate - 11. Nov, 18:12

If we work with a probabilistic model throughout the whole project, we can rank everything all the time, maybe this will reduce human work.
Regards,
Aden Jeff - san diego real estate

icon

semantic weltbild 2.0

Building the Semantic Web is easier together

and then...

foaf explorer
foaf

Geo Visitors Map
I am a hard bloggin' scientist. Read the Manifesto.
www.flickr.com
lebard's photos More of lebard's photos
Skype Me™!

Search

 

Users Status

You are not logged in.

I support

Wikipedia Affiliate Button

Archive

April 2009
Sun
Mon
Tue
Wed
Thu
Fri
Sat
 
 
 
 1 
 2 
 3 
 4 
 5 
 6 
 8 
 9 
10
11
12
14
15
16
17
18
19
23
24
25
27
28
29
30
 
 
 

Credits

Knallgrau New Media Solutions - Web Agentur für neue Medien

powered by Antville powered by Helma


Creative Commons License

xml version of this page
xml version of this page (with comments)

twoday.net AGB